You are on page 1of 23

Univariate Models

Autoregressive model (AR model) a linear regression of the current value of


the series against its previous values

Pure AR can be estimated with OLS


Moving average model (MA model) a linear regression of the current value of
series against current and previous white noise error terms or random shocks

MA models cant be estimated with OLS, but using Maximum Likelihood


Autoregressive moving average model (ARMA model) a linear regression f
the current value of series against previous values of the series (AR
components) and current, previous white noise error terms (MA component)

Autoregressive integrated moving average (ARIMA model)


ARIMA lagged values of the dependent variable and/or random disturbance term
as dependent variable. Compared to ARMA, ARIMA involves differencing the
data.
Stationarity and models:
o AR may/may not be stationary (unless prior conditions are imposed)
o MA always stationary: white noise series: E() =0; var() = 2, cov (t, t-1) =0
1. Stationarity the probability laws governing the process do not change over
time
- the process is in statistical equilibrium
- the process in free of a trend and seasonality
! mean, variance and autocorrelation structure do not change over time
Consequences of Non-stationarity:
1. Violation of CLRM assumption (OLS is not the best estimator anymore): infinite
variance
2. Wrong specification
3. Spurious regression

Strong stationarity: probability distribution (expected value, var) of the


stochastic process is invariant under a shift in time (lets say shifting it by m
periods)
! means that the joint distribution does not depend on time
- does not imply weak stationarity
Weak stationarity (covariance stationary): if its 1st and 2nd order moments are
unaffected by a change of time origin
- Covariance and correlation are functions of only the time differences
- When expected value, mean and covariance of the distribution random
variables are constant for all points of time.
Spurious regression: we frequently reject the null that b=0 (i.e regression
coefficient of the explanatory variable does not influence the dependent
variable), even when the null is correct
- strong (causal) relationship between variables that are not causally related
- often its a consequence of regressing non-stationary variables on each other
- non- stationary residuals
- wrong distributional assumptions
Detection:-

very high R2
- high t-values
-highly significant results

Box Jenkins refers to a set of procedures for identifying and estimating timeseries models within the class of ARIMA.
Steps:
1.
2.
3.
4.
5.

Stationarity checking
Model identification
Parameter estimation
Diagnostic checking
Forecasting

Step 1. Stationarity checking:


a) Eye-balling
b) Inspection of autocorrelation function (correlograms)
- Statistical measures that indicate how a time series relates to itself over time.
- For stationary processes, autocorrelation between any two observations only
depends on the time lag h between them.
- The time-series values y1, y2, , yn should be considered stationary if its
values either cuts of fairly quickly or dies down fairly quickly

c) Unit Root Tests: not considered by BJ


Include: ADF, Phillips-Perron test
Correction:
- Diferencing until stationarity is achieved. Examine the stationarity again by:
correlograms, Unit Root tests.
- Level of integration = number of time a series should be differentiated to
become stationary
- In practice: 2nd differences should be enough

Step 2: Model identification:


The model should be as parsimonious as possible:
- including irrelevant lags in the model increases standard errors, reduces the tstat (making hypotheses testing more difficult)
- models that incorporate large numbers of lags do not tend to forecast well.

PAC are used to measure the degree of association between Yt and Yt-k,
when the effects of other time lags (1, 2, 3, , k1) are removed.
PAC identifies the extent of the lag/ the order in an AR model.
To identify MA lag structure, we use ACF

Step 3: Parameter estimation


- AR: OLS, while MA/ARMA using ML
ML: those value of the parameters/regression coefficients for which the actually
observed data are most likely, i.e., the values that maximize the likelihood
function L
- ML is a function of the intercept, coefficients and the variance of the error term
given n observations y1, y2, , yn of the output variable

o
o

Step 4: Diagnostic Checking:


Box-Jenkins method does not automatically yield a single best model. We need
to compare them:
T-tests for coefficient signif. (add explanatory power and should be included)
Residual analysis (the residuals should be uncorrelated). If they are correlated:
systematic movement in the output not accounted by ARMA
Portmanteau test (Ljung Box) test for presence of serial correlation in the
residuals
H0: Residuals are independently distributed (i.e. any observed correlations in the
data result from randomness of the sampling process)
We want the Null of Ljung Box.
o

Goodness-of-fit and model selection criteria

a) Adjusted R2: the percentage of the variation of Y around its mean that is
explained by the regression equation, adjusted for degrees of freedom. We need
high values.
b) AIC and BIC should be as small as possible.
Step 5: Forecasting:

- Forecasts from ARMA model is not perfectly accurate


- Quality of forecasts diminishes with the amount of j-steps-ahead into the future
- Forecasts can be either in-sample or out-of-sample (better test for how the
model works, since it uses data not included in the model estimation). One may
leave some observations at the end of the sample for this purpose
- The model that fits the data best does not need to have the best forecast
performance
Evaluation: plot the forecast against the actual (observed) values:
- direct comparison of patterns
- comparison of turning points
Measures of Forecast accuracy:
1. Mean squared error (smaller value)
2. Mean absolute error
3. Mean average prediction error
4. Theils U statistic (forecast is compared to a benchmark model): the ones
outperforming the benchmark should be preferred
Problems with BJM:
1. Ad-hocism: relying on ones judgement and experience
2. Non stationarity:
a) First diference filter does not work for trends, structural breaks and
cointegration
b) Cointegration: long run equilibrium exists between a set of variables. Even
through individual variables may be non-stationary, its possible for linear
combinations to be stationary.
c) Simple de-trending of time series by first-differencing may lead to spurious
regression results when a long-run (cointegrating) relationship exist. May be
modeled by ECM.
- The issue of cointegration arises only for AR and ARMA models that also include
control variables.
3. Seasonality
4. Structural breaks: one assumption is that data structure doesnt change
(constant coefficients). Otherwise: parameters instability.
! there is a sudden change in time series and/or relationship between time series

Chow Test: when structural break is known


H0: No structural break at given point in time
- Fit ARMA model for pre- and post- break data
- Sufficient difference between models indicates the presence of a structural
break (large Chow test and rejection of null)
- Largest Chow test stat indicates the actual break date (Endogeneous break)
For Chow: break date should be known. But, breaks may not fully manifest
themselves at specific dates, but gradually =>
CUSUM test (detects more gradually changes)
A series of one-step ahead forecast errors is obtained by running a series of
regressions, with the first regression using the first k observations (k = number
of parameters in original models) to generate a prediction of the dependent
variable at observation k+1 etc. (recursive estimations). The cumulated sum of
forecast errors is printed.
H0 is rejected at 5% if cumulated sum strays outside the confidence band.
5. Other problems: heteroskedasticity and other specification errors.

Fundamentals of the BJM:


1. Parsimony: Model selection criteria
2. Stationarity and invertibility: Examination of correlogram
- An MA process is said to be invertible if it can be converted into a stationary AR
process of infinite order
- MA processes are not invertible unless appropriate prior conditions are imposed
on the parameters
Pure AR(p): always invertible
Pure MA(q), ARMA(p,q): may/ may not be invertible
- Non-invertible models can not be estimated with the Box-Jenkins method
3. Good fit of data: Goodness of fit measurements
4. White noise approximation of residuals: examination of residuals (Q- stat)
5. No coefficient instability: (test for structural breaks (ex: Chow test)
6. Good out-of-sample forecasts: examination of forecasting performance
! also look at t-test for coefficient significance

Topics 5: ARCH/ GARCH models


ARCH (and GARCH) models heteroskedasticity as a variance to be modeled
- simultaneous modeling of the mean and variance

ARCH: conditional variance of the disturbance of the output variable (fitted


squared residuals) is sequence is an AR process
-> Significant lags of the AR process = its order

Larger a1: more persistent change in yt, stronger tendency to remain away from
its mean
Larger 1: a more persistent shock in the sequence
Drawbacks: high persistence of variance => high order of ARCH
- Only positive coefficients in the variance model
GARCH: model of conditional variance as an ARMA process => it may be more
parsimonious

- significant lags of ARMA = order of GARCH


t = vv ht
ht= ao+ ai 2 t-i + biht-i
MA(q)
AR(p)
a1 = size of the shock
b1 = small/ large autoregressive persistence
1. Testing:
1. Use OLS to estimate the most appropriate regression
2. Store squares of the fitted errors
3. Regress these squared residuals on a constant and q lagged values of
themselves
H0: ARCH are likely not present and do not need to be modeled
H1: rejection of the null: large t and low P =>ARCH errors are likely to be present
2. Goodness of Fit: examination of standardized residuals [should be: mean,
var(0, 1)]
- inspection of correlogram for remaining serial correlation (to see if mean = 0)
- inspection of correlogram for remaining volatility (var =1)
Model comparison: always choose for smallest value of AIC, BIC, Log likelihood.
3. Forecasting: the same as in BJM
- forecast volatility first, then the process
- To check for accuracy: plot the forecast against the actual (observed) values of
a time series
Measures: Mean Squared Error, Mean Average Error, Mean Average Prediction
Error
(G)ARCH models cant be estimated using OLS, since it doesnt model the
variance =>
Maximum Likelihood: assumes a variable x, which follows the normal
distribution, but the mean and standard deviation are unknown. We maximize
the likelihood f-n to find them.

Fat tails: Higher probability of


losses/gains than indicated by
the normal distribution. In this
case, we use t-distribution.

! Make sure that:


1. no serial correlation (model of mean is appropriate)
2. parsimonious
3. if structural breaks (high persistent conditional volatility): it can be captured
with a dummy variable indicating the structural break. Perform Chow test
Other models:
1. IGARCH (AR has a unit root => cond. var acts like a Unit Root process)
2. TARCH: leverage effect: bad (higher effect)/ good news- model asymmetric
shocks
3. EGARCH : for asymmetric shocks as well, estimated coefficients can be
negative

Gretl:
1. Inspection of summary statistics: mean is small; maximum and minimum
large, large standard deviation: point to conditional heteroskedasticity
2. Graph inspection: tranquility until
t=50, turbulence after that point =>
ARCH may be ok

3. Regress OLS with lags

4. Plot the correlograms and check:


- the default lag lengths
- presence of serial correlation
! residuals are not serially autocorrelated, but their squares may follow an AR(p)
For ARCH: only ACF is relevant; GARCH: we check PACF as well
When both ACF, PACF die down use GARCH
5. LM test for ARCH errors: OLS -> tests-> ARCHIf LM-stat is large and significant
-> reject the null of NO ARCH error-> model ARCH error(1)

6. Fitting (G)ARCH: model of mean AR(1); model of variance is ARCH(1) or


GARCH (0,1)

7. Obtain residuals and squared residuals


! residuals: no sign of serial correlation
! squared residuals: no sign of volatility
- Variance is not 1 yet, another (G)ARCH may be more suitable, but we need to
keep it parsimonious
8. Forecasting: confidence intervals become larger with larger forecast horizon

Topic 6: Models with Trends


Components of Time Series:
yt = trend + stationary component + noise
Stationary component: modeled with BJ
Noise: ARCH/ GARCH
Trends:
a. Deterministic +noise E(Yt) E(Yt-1) = b
The trending variable changes by a constant amount each period (b)
Yt = b0 +b1t+ ; t time trend = white noise
- The mean that grows around a fixed trend, which is constant and independent
of time.
- Always revert to the trend in the long run (the effects of shocks is eliminated)
Also known an: trend-stationary processes
b. Stochastic E(Yt) E(Yt-1) = a + vt;
a= constant, v = random amount
The trending variable changes by a random amount each period plus a constant
(random, but permanent shock to series)
Also called: Difference stationary or Unit Root processes
c. Random Walk + noise: Yt = Yt-1 +
the value of a time series at time t will be equal to the last period value plus a
stochastic component that is a white noise
- It is a mean-non-reverting process, that can move away from the mean in a
positive/ negative direction
- Variance evolves over time and goes to infinity as time goes to infinity => RW
cant be predicted
d. RW with a drift (+noise)

predicts that the value of a time series at time t will equal the last period's value
plus a constant, or drift, and a white noise term
Yt = a +Yt-1 +
- both deterministic (constant) and stochastic trends
It does not revert to the long run mean and its variance depends on time

Removing Trends:
a) Differencing: a series with unit root is transformed into a stationary series.
Number of times a series is differences = order of integration I(d). Usually: max
I(2).
Recall: The first-difference of log-level data approximates the growth rate of the
log-level data
b) Detrending: a series with a deterministic trend can be transformed into a
stationarz series by removing the trend.
- Use of OLS regression to identify the trend component and then subtract the
trend component from t he original data series.
-Residuals from regression with trend= detrended time series

Diference vs. Detrending


- Detrending is not suitable for UR: it does not eliminate the stochastic portion of
series
- Differencing introduces the UR process into trend-stationary series
Detecting Trends:
1. Eye-balling: a series should have: constant mean, variance and no visible LR
trend
2. Inspection of correlogram indicates how a time series is related to itself over
time
Stationary if: ACF cuts of/ dies down fairly quickly
3. t-test: for Yt=b0+ b1Yt-1+ t
H0: b1=1 (non-stationary) c.v = 1.65
Ha: b1 <1 (stationary)
Problem: b1 may be downward biased, especially in small samples (b/c use of
lagged y)

We
want
TO

REJECT!
4. Unit root test:
a. Dickey Fuller test: for AR(1) process with a drift: Yt = b0 + b1Yt-1+ t
Estimate using OLS and examine the estimated b 1:
H0: b1 = 1 (non stationary)
Ha: b1 < 1 (stationary)
DF suggest to subtract Yt-1 from both sides: Yt-Yt-1= b0+ b1Yt-1 -Yt-1 + t
Yt= b0+ Yt-1 + t, where = b11
H0: = 0 (non-stationary = UR exists)
Ha: < 0 (stationary) => we want TO REJECT

! We test for RW using F-test


b. ADF test: an extension that considers higher-order AR by including lags of
the dependent variables
Final lag should correspond to a parsimonious model:
- too few: residuals do not behave like white-noise (ex: autocorrelation presence)
- too many: reduces the power of test to reject the null
Selection of lag length:
- General to specific approach: start with many lags and pare down by means of
F/t tests
- minimize AIC/ BIC

Extensions and Problems:


1. Structural Breaks: DF test is biased (null is accepted in pre- and post- break
periods)
It can lead to wrong decision of model fitting.
- Eye-balling
- Zivot- Andrews is a UR test that allows for single, unknown break in trend
and/or intercept
- If breaks present: include dummy variables to account for breaks when testing
UR
2. Power of Unit Root Tests: the probability of correctly rejecting the null
hypothesis.
DF/ ADF often wrongly indicate that a series contains a Unit Root. It has little
power to distinguish between trend-stationary and unit root processes.

- The closer is a1 to 1, lower the power


of the test

Type 1: Reject null, when it is non-stationary


Type 2: Accept null, when it is stationary.

Tests with more power:


- URT that allow for structural breaks (ex: Zivot Andrews)
- URT that model more efficiently deterministic components that may include a
UR process (ex: Schmidt Phillips)
- Panel URT
It may be useful to compare results of different tests
Cointegration: when more than one variable is considered.
- may lead to spurious regression results when a long-run (cointegrating)
relationship exist
Detection: prior tests for unit roots to test if series are non-stationary
- Cointegration tests (ex: Engle Granger)
! need to be corrected using error-correction-model
Remedy:
1. Use URT do determine order of integration
2. Run cointegrating regression
3. Apply an appropriate URT to residuals from this regression to test for
cointegration
4. If cointegration is accepted: use lagged residuals from the cointed regression
as an error correction term in ECM.

Topic 7: Multi-equation Time Series Models


Single equation models: dependent variable is modeled using lags of itself ->AR.
It may contain MA to improve model fit and parsimony. If involves differencing
ARIMA.

- If heteroskedasticity: specify the variance model (G)ARCH, while the mean


remains an ARIMA model
- Used for forecasting
- Share little relationship with economic theory, include only uninformative
components: lagged DV, time trends, constants, dummy variables => potential
disconnect between statistical modeling and economic theory
Multi-equation models: may still include an AR component, constant, trends,
other dummies, but also considers truly explanatory variables.
- relation between economic theory can be tested empirically
- policy intervention analysis
- cause and effect analysis
Benefits: - reduced omitted variable bias
- improved goodness of fit
- improved forecasting
- availability of further tools of time-series analysis (ex: causality analysis,
variance decomposition)
Types:
I. Intervention analysis statistical technique that generalizes the univariate
time-series methodology
- time path of dependent variable influenced by time path of an independent
variable
- No feedback
- used for forecasting and various intervention analysis

z = intervention variable: pure jump or different forms


It evaluates the initial and long run effects of a specific policy
Initial: denoted my the magnitude, sign and statistical significance of z
Long Run Efect: equal to the long run mean after the intervention minus
value of the original (pre-intervention) mean
Disadvantages: - Cant capture feedback
- Difficult to differentiate between dependent and independent variables
- (Presence of feedback) biases estimates, b/c correlation of Y with the error term
II. VAR models (vector autoregressive models) are used for multivariate time
series. Each variable is a linear function of past lags of itself and past lags of the
other variables. So, it contains more than one dependent variable. All variables
are endogenous.

i. Characteristics/ Assumptions:
1. Stationarity: all parameters are stable
2. Linearity: relationship between variables as linear regression

3. Reduced form: VAR models only past (lagged) variables


Primitive VAR (with contemporaneous terms) -> algebraic transformation ->
reduced VAR
4. Symmetry: all variables enter VAR with the same lag length
5. Finite-order VAR: number of lags is finite (data constraints)
6. Forecast errors: shocks can be given an economic interpretation, modeling
dynamic response of shocks and identification of shocks
! error terms are homoscedastic and serially uncorrelated
Forecasting: for macro variables, future prices of securities
- Forecast from unrestricted VAR may be unreliable: over parameterized models
or insignificant coefficients) => Near VAR model with improved forecast
performance
ii. Identification:
a. Primitive/ structural VAR: (with contemporaneous independent variables) cant
be estimated directly b/c correlation between contemporaneous variables with
error terms
- Tied to a particular theory
- Better interpretation of forecast
b. Reduced/ unrestricted VAR (without contemporaneous i.v.) can be estimated
using OLS
- Statistical description of data
- Compatible with lots of theories
- Good short-term forecast
From Reduced to Structural VAR:
(1) Estimation of a reduced (unrestricted) VAR
(2) Impose suitable theoretical restriction: recursive system, coefficient,
variance, symmetry and other Long Run restriction

iii. Estimation:
VAR using OLS: d.v. are identical for each equation in the VAR system
(symmetry = same lags length) + homoscedastic and serially uncorrelated
errors
Near-VAR (ex: SUR): allows for errors correlation across equations
- more efficient when RHS are not identical across equations
So, OLS may suffer from violations
iv. Tools:

1. Impulse Responses: response of DV to shocks to the error term, where a shock


is applied to each dependent variable and its effect are noted
=> assumes that error returns to zero in subsequent periods and all other errors
are zero
! Measures how long and to what degree a shock to a given equation matters to
the variables in the system

2. Variance Decomposition: analyze the relative importance of variables. It gives


information abut the relative importance of each shock to the variables in the
VAR.
- Percentage of the forecast error variance attributable to each DV (like partial
R2)

Ordering (placing variables in decreasing order to relative exogeneity) is


important, b/c:
- VAR errors may not be statistically independent
Granger causality: when past values of X provide statistically significant
information about future values of Y (X Granger causes Y)
H0: No Granger causality
Significant findings of F-test, individual t-test may be biased due to collinearity

Problems:
No priori assumptions about exo-/endogeneous variables. VAR may not be
complete (omitted variable bias)
1. Lag length selection incorrect lag may affect the variance decomposition
2. Over fitting: Rule of Thumb: no more than 4 endog. variables, use more data)
=> Use of IC for choosing right number of lags. MAIC is more conservative
than simple AIC (less danger of overparametrization)
3. Non-stationarity: VAR tests are usually accompanied by URT
Cases and VAR specification:
Case 1. All variables are stationary -> VAR in levels (non-transformed data)
Case 2. Variables integrated of different orders -> VAR meaningless
Case 3. The same order of integration, but NOT cointegrated ->VAR in
differences (otherwise: spurious regression)
Case 4: The same order of integration and cointegrated -> VAR with error
correction model
(1) Use URT to determine order of integration
(2) Run cointegrating regression
(3) Use URT to the residuals to test for cointegration
(4) If cointegration is accepted, use lagged residuals from the cointegrated
regression as an error correction term in ECM
4. Other issues: structural breaks, outliers etc

Topic 8: Cointegration and VECM


Cointegration refers to a situation where there is a linear combination of
integrated (i.e., non-stationary) time-series variables (within an econometric
model) that is stationary
Consequences of Cointegration:
1. Spurious regressions (cointegrated variables are not always stationary)
- oftentimes the consequence of regressing non-stationary variables on each
other
2. Misspecification error VAR approach in differences, rather than levels.
A VAR in differences ignores the information that the levels of the variables
cannot move independently of each other => no Long Run equilibrium
- inferior forecasting performance
- may bias hypothesis testing, ex: Granger causality
- can not differentiate between SR and LR dynamics
Choosing the correct model: 4 cases
VECM is a special case of VAR. Error correction term refers to the linear
combination of the 2 variables (in levels) that is stationary.
Granger Representation Theorem: if 2 series are cointegrated => ST
disequilibrium can be expressed in the error correction form (it incorporates the
LR information into the model)
Error correction term: represents the speed to which the model returns to
equilibrium following a shock. It should be negatively signed (+: movement away
from equilibrium)
Error correction model = unrestricted VAR
Error correction term = structural VAR
Cointegration tests
1. Engle- Granger method: used when 2 variable system are possible I(1)
cointegrated.
It assesses if the residual of an estimated equilibrium is stationary.
(1): Pretest the considered variables for their order of integration: variables need
to have the same order of integration. We use ADF:
- If I(d): go to step 2
- If I(0): use levels

(2): Estimation of the LR relationship: run OLS: Y= b0+b1zt+ t


- If cointegrated: strong linear relationship between these 2 due to a common
trend. The residuals contain estimated deviations from the LR relationship
(3): Saving of residuals from 2)
(4): Test whether this residual has a Unit Root: using Engle- Granger: use
different critical values than ADF
H0: No cointegration
Ha: cointegration

b) Johansen is more generally applicable tests for several I(I) time series and
allows for more than one cointegrating relationship. It is based on maximum
likelihood relationship and 2 statistics: maximum eigenvalues and the tracestatistics.
H0 for trace-statistics: number of cointegrating vectors r<=?
H0 for eigenvalues: r=?
One can assess the number of cointegrating relationships to be modeled.

(1) Theory considering the statistical analysis and good data


(2) Stationarity Analysis: eye-balling, ACF correlograms, unit root tests (lags,
specification, power)
(3) Cointegration Analysis: test considering lag length, specification of
cointegration tests and different choices
(4) Model Fit: choose the adequate multi-equation model: 4 types
(5) Diagnostic Checks:

Length selection for VAR/ VECM


Overall goodness of fit
F-test for joint significance (Granger causality)
T-test for individual regression coefficients
Possible elimination of insignificant lags
Sign and size of error correction term
Forecast performance and model comparison with IC
Presence of multicollinearity, autocorrelation and other possible OLS
assumptions
(6): Analysis of findings:
- Short run dynamics (regression coefficients)
- Long-run dynamics (coefficient associated with error correction term)
- Granger causality, hypothesis testing, impulse response
- Connection to theory

You might also like