You are on page 1of 10

ADVANCED FORECASTING MODELS

USING SAS SOFTWARE


Girish Kumar Jha
IARI, Pusa, New Delhi 110 012
gjha_eco@iari.res.in
1. Transfer Function Model
Univariate ARIMA models are useful for analysis and forecasting of a single time series. In such
situations, we can only relate the series to its own past and do not explicitly use the information
contained in other pertinent time series. In many cases, however, a time series is not only related
to its own past, but may also be influenced by the present and past values of other time series.
The models that can accommodate such situation are referred as the Transfer function models
(Box et al, 1994). Transfer function models, which are extensions of familiar linear regression
models, have been widely used in various fields of research. In macroeconomics, transfer
function models can be used to study the dynamic interrelationships among the variables in an
economic system. In marketing, these models are used to determine the factors, such as
advertisement, competition, or economic conditions that may affect the sale of certain products.
Because of its close relationship with regression models, transfer function models are also
referred to as dynamic regression models (Pankratz, 1983).
Transfer function approach to modeling a time series is a multivariate way of modeling the
various lag structures found in the data. It is similar to a distributed lag model in traditional
econometrics. There may seem to be a close relationship between the Transfer Function models
and multiple regressions (OLS). But the transfer function models differ from the regression
model in the way they use the explanatory variables to forecast the dependent variable. The
simple transfer function models assume contemporaneous relationship between explanatory
variables and the dependent variable (forecast of the explanatory variable at time t+1 explains the
behavior of dependent variable at the time t+1). General Transfer Function models extend the
simple transfer function approach to include previous, or lagged, values of the explanatory
variables (General transfer function can use the forecasts of the explanatory variable at the time
t+1 to explain the behavior of dependent variable at time t+2). The transfer function models can
use more than one explanatory variable, but the explanatory variables must be linearly
independent of each other.
The transfer function models use forecast values of the explanatory variables to forecast the
values of the dependent variables. The variability in the forecasts of the explanatory variables is
incorporated into the forecasts of the dependent variable. To model the dependent variable with
simple transfer function model, we need to perform more task than that required for the
regression model. The following steps are involved in the modeling of simple transfer function:
Identify the model to describe the explanatory variables
Estimate a model for the explanatory variables
Identify and estimate the regression model for the dependent variable, using the
explanatory variables and an appropriate error process
Forecast the dependent variable
257

Advanced Forecasting Models Using SAS Software

Thus, we have to model the explanatory variable before using them to model the dependent
variable and then forecast with the transfer function model. Forecasting with regression model
does not require any modeling of explanatory variables.
7.1 Example of Transfer Function Model
For example, suppose we want to model the effect of an advertising campaign on sales. As we
know, the effect of an advertising campaign lasts for some time beyond the end of the campaign.
Hence, monthly sales figures (y) may be modeled as a function of the advertising expenditure in
each of the past few months. We will model the sales series as a regression against the
advertising expenditure from the current month and the past few months. We can use the PROC
ARIMA to carry out a simple Transfer Function Model. This is illustrated by the following SAS
statements:
data sale;
title Estimate the Model for the dependent Variable;
title "t=time y=sale volume x = advertising expenditure";
input t y x;
datalines;
1
12.0 15
2
20.5 16
3
21.0 18
4
15.5 27
5
15.3 21
6
23.5 49
7
24.5 21
8
21.3 22
9
23.5 28
10
28.0 36
11
24.0 40
12
15.5 3
13
17.3 21
14
25.3 29
15
25.0 62
16
36.5 65
17
36.5 46
18
29.6 44
19
30.5 33
20
28.0 62
21
26.0 22
22
21.5 12
23
19.7 24
24
19.0 3
25
16.0 5
26
20.7 14
27
26.5 36
28
30.6 40
29
32.3 49
258

Advanced Forecasting Models Using SAS Software

30
29.5 7
31
28.3 52
32
31.3 65
33
32.2 17
34
26.4 5
35
23.4 17
36
16.4 1
;
proc print data=sale;
run;
proc arima data=sale;
identify var=y crosscorr=(x) noprint;
estimate input =((1 2 3)x);
run;
The output for the above SAS code is given below
Estimate the Model for the dependent Variable
t=time y=sale volume x = advertising expenditure

Parameter
MU
NUM1
NUM1,1
NUM1,2
NUM1,3

Estimate
13.61539
0.14644
-0.15063
-0.05018
-0.02720

The ARIMA Procedure


Conditional Least Squares Estimation
Standard
Approx
Error
t Value
Pr > |t|
1.87392
7.27
<.0001
0.03597
4.07
0.0003
0.03920
-3.84
0.0006
0.03898
-1.29
0.2085
0.03770
-0.72
0.4766

Lag
0
0
1
2
3

Variable
y
x
x
x
x

Constant Estimate
13.61539
Variance Estimate
13.72087
Std Error Estimate
3.704169
AIC
184.6522
SBC
192.1348
Number of Residuals
33
* AIC and SBC do not include log determinant.

Variable
Parameter
y
x
x
x
x

Correlations of Parameter Estimates


y
x
x
x
MU
NUM1
NUM1,1
NUM1,2
MU
NUM1
NUM1,1
NUM1,2
NUM1,3

1.000
-0.403
0.312
0.261
0.477

-0.403
1.000
0.322
-0.074
0.015

259

0.312
0.322
1.000
-0.333
0.119

0.261
-0.074
-0.333
1.000
-0.322

x
NUM1,3
0.477
0.015
0.119
-0.322
1.000

Shift
0
0
0
0
0

Advanced Forecasting Models Using SAS Software


Autocorrelation Check of Residuals
To
Lag

ChiSquare

DF

Pr >
ChiSq

6
12
18
24

12.80
17.73
19.54
48.37

6
12
18
24

0.0463
0.1240
0.3592
0.0023

--------------------Autocorrelations-------------------0.482
0.248
-0.067
-0.042

0.191
0.192
-0.050
-0.036

0.248
0.041
-0.134
-0.152

0.110
-0.008
0.011
-0.367

0.016
0.000
0.048
-0.289

0.071
-0.087
-0.031
-0.140

Model for variable y


Estimated Intercept

13.61539

The ARIMA Procedure


Input Number 1
Input Variable

Numerator Factors
Factor 1:

0.14644 + 0.15063 B**(1) + 0.05018 B**(2) + 0.0272 B**(3)

The CROSSCORR= option of the IDENTIFY statement prints sample cross-correlation


functions that show the correlation between the response series and the input series at different
lags. The sample cross-correlation function can be used to help identify the form of the transfer
function appropriate for an input series. In this case, following model has been estimated.
Yt ( 0 1 B 2 B 2 3 B 3 ) X t at

This example models the effect of advertising expenditure (x) on sale (y) as a linear function of
the current and three most recent values of advertising expenditure (x). It is equivalent to a
multiple linear regression of sale (y) on x, LAG(x), LAG2(x), and LAG3(x). This is an example
of a transfer function with one numerator factor. The numerator factors for a transfer function
for an input series are like the MA part of the ARMA model for the noise series. We can also use
transfer functions with denominator factors. The denominator factors for a transfer function for
an input series are like the AR part of the ARMA model for the noise series. Denominator factors
introduce exponentially weighted, infinite distributed lags into the transfer function. To specify
transfer functions with denominator factors, we place the denominator factors after a slash (/) in
the INPUT= option. For example, the following statements estimate the advertising expenditure
effect as an infinite distributed lag model with exponentially declining weights:
proc arima data = sale;
identify var = y crosscorr = x;
estimate input = ( / (1) x );
run;
The transfer function specified by these statements is as follows:

0
Xt
(1 1 B)

260

Advanced Forecasting Models Using SAS Software

This transfer function also can be written in the following equivalent form:

0 (1 1i B i
i 1

This transfer function can be used with intervention inputs. When it is used with a pulse function
input, the result is an intervention effect that dies out gradually over time. When it is used with a
step function input, the result is an intervention effect that increases gradually to a limiting value.
2. Volatility Forecasting
One of the main assumptions of the standard regression analysis and regression models with
autocorrelated errors is that the variance , of the errors is constant. In many practical
applications, this assumption may not be realistic. For example, in financial investment, it is
generally agreed that stock markets volatility is rarely constant over time. Indeed, the study of
the market volatility as it relates to time is the main interest for many researchers and investors.
Such a model incorporating the possibility of a nonconstant error variance is called a
heteroscedasticity model. Many approaches can be used to deal with heteroscedasticity. For
example, the weighted regression is often used if the error variance at different times is known.
In practice, however, the error variance is normally unknown; therefore, models to account for
the heteroscedasticity are needed.
Volatility has been one of the most active and successful areas of research in time series
econometrics and economic forecasting in recent decades. Volatility refers to the variability of
the random (unforeseen) component of a time series. In economic theory, volatility connotes
two principal concepts: variability and uncertainty; the former describing overall movement and
the latter referring to movement that is unpredictable.
There are various ways of measuring price volatility. The nave approach involves treating all
price movements as indicative of instability by calculating standard deviation of the price index.
This approach does not account for predictable components like trends in the price evolution
process thereby overstating the uncertainty. A better and useful method of measuring instability
is by using the ratio method. In this method, the instability of the series is calculated by
measuring the standard deviation of log (Pt / P t-1) over a period, where Pt is price in period t
and Pt-1 is the price in period t-1. The third approach is the one which distinguishes between
predictable and unpredictable components of price series, but the price volatility is assumed to
remain time invariant. The fourth approach distinguishes not only between predictable and
unpredictable components of prices but also allows the variance of unpredictable element to be
time varying. Such time varying conditional variances can be estimated by using a Generalized
Autoregressive Conditional Heteroscedasticity (GARCH) model.
8.1 ARCH
The original model of autoregressive conditional heteroscedasticity (ARCH) introduced in Engle
(1982) has the conditional variance equation

261

Advanced Forecasting Models Using SAS Software

where the constraints on the coefficient are necessary to ensure that the conditional variance is
always positive. This is the ARCH
conditional variance specification, with a memory of p
periods. This model captures the conditional hetroscedasticity of returns by using a moving
average of past squared unexpected returns: if a major market movement in either direction
, then the effect will be to increase todays conditional variance.
occurred periods ago
This means that we are more likely to have a large market move today, so large movements
tend to follow large movement of either sign which is known as volatility clustering.
8.2 Vanilla GARCH
model by Bollerslev (1986) adds q autoregressive
The generalization of Engles ARCH
terms to the moving averages of squared unexpected returns. Then it takes the form

The parsimonious GARCH (1, 1) model, which has just one lagged error square and one
autoregressive terms, is most commonly used:

It is equivalent to an infinite ARCH model, with exponentially declining weights on the past
The sum of
gives the degree
squared errors. In the above model, the sum of
of persistence of volatility in the series. The closer the sum to 1, greater is the tendency of
volatility to persist for longer time. If the sum exceeds 1, it is indicative of an explosive series
with a tendency to meander away from mean value. The GARCH estimates are being used to
identify periods of high volatility and volatility clustering. The constant determines the longterm average level of volatility to which GARCH forecasts converge. Unlike the lag and returns
coefficients, its value is quite sensitive to the length of data period used to estimate the model. If
a period of many years is used, during which there were extreme markets movements, then the
estimates of will be high.
8.3 Integrated GARCH
When
we can put

and we write the GARCH (1, 1) model as

This is a non-stationary GARCH model called the integrated GARCH (I-GARCH) model, for
which term structure forecasts do not converge. Our main interest in the I-GARCH model is that
when
it is equivalent to an infinite Exponentially Weighted Moving Average (EWMA).
8.4 Example for GARCH modeling
In this example, we consider the bivariate series containing 46 monthly observations on Mumbai
and Delhi spot prices for onion from January 1988 to October 1991, measured in rupees per 1000
grams (Rs/kg). Mumbai is the main market for Maharashtra which is one of the major onion
growing states. Objective is to examine whether one can predict the Delhi spot price from the
current spot price of Mumbai using time series regression model. This is the situation of
regression with time series errors and unequal variances.
262

Advanced Forecasting Models Using SAS Software

Title1 'Spot prices of onion in Delhi and Mumbai';


Title2 '(January 1988 to October 1991)';
DATA onion;
Observations = _N_;
INPUT Year Month Delhi Mumbai;
CARDS;
1988 1
1.875 2.065
1988 2
1.898 1.988
1988 3
1.643 1.818
1988 4
1.332 1.493
1988 5
1.262 1.383
1988 6
1.24
1.378
1988 7
1.265 1.433
1988 8
1.31
1.543
1988 9
1.467 1.713
1988 10
1.5
1.688
1988 11
1.633 1.908
1988 12
1.78
2.207
1989 1
1.803 2.173
1989 2
1.472 1.74
1989 3
1.247 1.458
1989 4
1.273 1.515
1989 5
1.373 1.642
1989 6
1.408 1.687
1989 7
1.378 1.643
1989 8
1.375 1.575
1989 9
1.308 1.513
1989 10
1.315 1.555
1989 11
1.447 1.765
1989 12
1.75
2.102
1990 1
2.203 2.42
1990 2
1.623 1.982
1990 3
1.263 1.487
1990 4
1.252 1.468
1990 5
1.252 1.45
1990 6
1.277 1.46
1990 7
1.252 1.418
1990 8
1.22
1.34
1990 9
1.24
1.353
1990 10
1.412 1.568
1990 11
1.807 2.052
1990 12
1.903 2.237
1991 1
1.627 1.83
1991 2
1.223 1.34
1991 3
1.208 1.318
263

Advanced Forecasting Models Using SAS Software

1991
1991
1991
1991
1991
1991
1991
;

4
5
6
7
8
9
10

1.208
1.205
1.165
1.02
1.065
1.287
1.613

1.32
1.298
1.258
1.117
1.137
1.368
1.732

PROC AUTOREG DATA=Onion;


MODEL Delhi = Mumbai /NLAG=1 GARCH=(Q=1);
OUTPUT OUT=Onion R=Residual P=Predicted
LCL=Low95CL UCL=Up95CL;
RUN;
PROC PRINT DATA=Onion;
RUN;
The output for the above SAS code is given below
Spot prices of onion in Delhi and Mumbai
(January 1988 to October 1991)
The AUTOREG Procedure
Dependent Variable

Oklahoma

Ordinary Least Squares Estimates


SSE
MSE
SBC
Regress R-Square
Durbin-Watson

Variable
Intercept
Mumbai

0.1650337
0.00375
-120.79173
0.9444
0.9324

DFE
Root MSE
AIC
Total R-Square

44
0.06124
-124.44902
0.9444

DF

Estimate

Standard
Error

t Value

Approx
Pr > |t|

0.1184
0.8038

0.0487
0.0294

2.43
27.35

0.0192
<.0001

264

Advanced Forecasting Models Using SAS Software

Estimates of Autocorrelations
Lag

Covariance

Correlation

0
1

0.00359
0.00170

1.000000
0.473597

-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
|
|

|********************|
|*********
|

Preliminary MSE

0.00278

Estimates of Autoregressive Parameters

Lag

Coefficient

Standard
Error

t Value

-0.473597

0.134312

-3.53

Algorithm converged.
The AUTOREG Procedure
GARCH Estimates
SSE
MSE
Log Likelihood
SBC
Normality Test

0.12768143
0.00278
80.3575828
-141.57196
0.0440

Observations
Uncond Var
Total R-Square
AIC
Pr > ChiSq

46
.
0.9570
-150.71517
0.9782

Table: Estimation of the GARCH(0,1) model

Obs
1
2
3
4
5
6
7
8
9
10
11

Variable

DF

Estimate

Standard
Error

t Value

Approx
Pr > |t|

Intercept
Mumbai
AR1
ARCH0
ARCH1

1
1
1
1
1

0.0677
0.8584
-0.6475
0.000219
1.9559

0.0223
0.0111
0.0542
0.000242
0.7352

3.04
77.01
-11.94
0.90
2.66

0.0023
<.0001
<.0001
0.3662
0.0078

Predicted
1.84024
1.79665
1.70841
1.35882
1.24367
1.25518
1.29093
1.37097
1.48489
1.47061
1.69471

Residual
0.02649
0.10135
-0.06541
-0.02682
0.01833
-0.01518
-0.02593
-0.06097
-0.01789
0.02939
-0.06171

Low95CL
1.67983
1.66931
1.58341
1.23434
1.11817
1.12963
1.16597
1.24675
1.36063
1.34644
1.56864

Up95CL
2.00065
1.92399
1.83342
1.48331
1.36916
1.38074
1.41589
1.49519
1.60915
1.59477
1.82077

Observations
1
2
3
4
5
6
7
8
9
10
11

265

Year
1988
1988
1988
1988
1988
1988
1988
1988
1988
1988
1988

Month
1
2
3
4
5
6
7
8
9
10
11

Delhi
1.875
1.898
1.643
1.332
1.262
1.240
1.265
1.310
1.467
1.500
1.633

Mumbai
2.065
1.988
1.818
1.493
1.383
1.378
1.433
1.543
1.713
1.688
1.908

Advanced Forecasting Models Using SAS Software


12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

1.91520
1.81503
1.47714
1.26142
1.32139
1.41555
1.44835
1.40823
1.35489
1.33752
1.36465
1.52610
1.78412
2.06597
1.80658
1.24958
1.27529
1.26327
1.28186
1.25644
1.19664
1.23043
1.42070
1.82802
1.97358
1.58357
1.21047
1.20233
1.20656
1.18656
1.16251
1.03782
1.03946
1.25577
1.58357

-0.13520
-0.01203
-0.00514
-0.01442
-0.04839
-0.04255
-0.04035
-0.03023
0.02011
-0.02952
-0.04965
-0.07910
-0.03412
0.13703
-0.18358
0.01342
-0.02329
-0.01127
-0.00486
-0.00444
0.02336
0.00957
-0.00870
-0.02102
-0.07058
0.04343
0.01253
0.00567
0.00144
0.01844
0.00249
-0.01782
0.02554
0.03123
0.02943

1.78290
1.68363
1.35274
1.13668
1.19703
1.29149
1.32419
1.28416
1.23077
1.21316
1.24047
1.40154
1.65446
1.92696
1.67934
1.12506
1.15063
1.13847
1.15714
1.13133
1.07059
1.10456
1.29657
1.69945
1.84045
1.45845
1.08442
1.07596
1.08022
1.05989
1.03517
0.90751
0.90963
1.13009
1.45921

2.04750
1.94642
1.60154
1.38616
1.44574
1.53962
1.57251
1.53230
1.47901
1.46188
1.48883
1.65066
1.91378
2.20499
1.93381
1.37411
1.39995
1.38808
1.40658
1.38155
1.32269
1.35630
1.54484
1.95659
2.10672
1.70869
1.33651
1.32869
1.33290
1.31324
1.28986
1.16812
1.16930
1.38144
1.70792

12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

1988
1989
1989
1989
1989
1989
1989
1989
1989
1989
1989
1989
1989
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1991
1991
1991
1991
1991
1991
1991
1991
1991
1991

12
1
2
3
4
5
6
7
8
9
10
11
12
1
2
3
4
5
6
7
8
9
10
11
12
1
2
3
4
5
6
7
8
9
10

1.780
1.803
1.472
1.247
1.273
1.373
1.408
1.378
1.375
1.308
1.315
1.447
1.750
2.203
1.623
1.263
1.252
1.252
1.277
1.252
1.220
1.240
1.412
1.807
1.903
1.627
1.223
1.208
1.208
1.205
1.165
1.020
1.065
1.287
1.613

2.207
2.173
1.740
1.458
1.515
1.642
1.687
1.643
1.575
1.513
1.555
1.765
2.102
2.420
1.982
1.487
1.468
1.450
1.460
1.418
1.340
1.353
1.568
2.052
2.237
1.830
1.340
1.318
1.320
1.298
1.258
1.117
1.137
1.368
1.732

The detailed interpretation of above analyses will be discussed in the class.


References
Bolerslev, Tim (1986). Generalized autoregressive conditional heteroscedasticity. Journal of
Econometrics, 31, 307-327.
Box, G.E.P., Jenkins, G.M. and Reinsel, G.C. (1994). Time Series Analysis: Forecasting and
Control, Pearson Education, Delhi.
Croxton, F.E., Cowden, D.J. and Klein, S.(1979). Applied General Statistics. Prentice Hall of
India Pvt. Ltd., New Delhi.
Engle, R. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of
United Kingdom inflation. Econometrica, 50, 987-1007.
Makridakis, S., Wheelwright, S.C. and Hyndman, R.J. (1998). Forecasting Methods and
Applications, 3rd Edition, John Wiley, New York.
Pankratz, A. (1983). Forecasting with univariate Box Jenkins models: concepts and cases,
John Wiley, New York.

266

You might also like