You are on page 1of 95

Module 2.

2: Multiple regression CLRM


violations and issues
(Ch 3 and 4 in Brooks)
Note: Some contents on this lecture note are originally based on Chris Brookss introductory econometrics
for finance (2002).

1
Learning objectives
Recognise the extension of two-variable
regression to multiple regression.
Reach to the integrated knowledge on CLRM
violations sources of problems, consequence,
diagnostic tests, and remedies.
The above means that you will become familiar
with classical terms in econometrics such as
heteroskedasticity, autocorrelation,
multicollinearity, and others.
Recognise the classical simple multiple regression
analysis in finance from Rozeff (1982) paper.


2
Generalising the Two-variable Model to
Multiple Linear Regression

Before, we have used the model
t = 1,2,...,T
But what if our dependent (y) variable depends on more than one
independent variable?
For example the number of cars sold might plausibly depend on
1. the price of cars
2. the price of public transport
3. the price of petrol
4. the extent of the publics concern about global warming
Similarly, stock returns might depend on several factors.
Having just one independent variable is no good in this case - we want to
have more than one x variable. It is very easy to generalise the simple
model to one with k-1 regressors (independent variables).

3
t t t
u x y + + = | o
Multiple Regression and the Constant Term

Now we write
, t=1,2,...,T

Where is x
1
? It is the constant term. In fact the constant term is usually
represented by a column of ones of length T:





|
1
is the coefficient attached to the constant term (which we called o before).


4
t kt k t t t
u x x x y + + + + + = | | | | ...
3 3 2 2 1
(
(
(
(

=
1
1
1
1

x
Different Ways of Expressing
the Multiple Linear Regression Model

We could write out a separate equation for every value of t:





We can write this in matrix form
y = X| +u
where y is T 1
X is T k
| is k 1
u is T 1
5
T kT k T T T
k k
k k
u x x x y
u x x x y
u x x x y
+ + + + + =
+ + + + + =
+ + + + + =
| | | |
| | | |
| | | |
...
...
...
3 3 2 2 1
2 2 32 3 22 2 1 2
1 1 31 3 21 2 1 1

Inside the Matrices of the
Multiple Linear Regression Model

e.g. if k is 2, we have 2 regressors, one of which is a column of ones:






T 1 T2 21 T1

Notice that the matrices written in this way are conformable.

6
(
(
(
(

+
(

(
(
(
(

=
(
(
(
(

T T T
u
u
u
x
x
x
y
y
y

2
1
2
1
2
22
21
2
1
1
1
1
|
|

How Do We Calculate the Parameters (the | )
in this Generalised Case?
Previously, we took the residual sum of squares, and minimised it w.r.t.
o and |.
In the matrix notation, we have





The RSS would be given by



7
(
(
(
(

=
T
u
u
u
u

2
1

| |

= + + + =
(
(
(
(

=
2 2 2
2
2
1
2
1
2 1
...

'
t T
T
T
u u u u
u
u
u
u u u u u
The OLS Estimator for the
Multiple Regression Model
In order to obtain the parameter estimates, |
1
, |
2
,..., |
k
, we would
minimise the RSS with respect to all the |s.

It can be shown that






8
y X X X
k
' ' =
(
(
(
(
(

=
1 2
1
) (

|
|
|
|

Calculating the Standard Errors for the


Multiple Regression Model
Check the dimensions: is k 1 as required.

But how do we calculate the standard errors of the coefficient estimates?

Previously, to estimate the variance of the errors, o
2
, we used .

Now using the matrix notation, we use

where k = number of regressors. It can be proved that the OLS estimator of
the variance of is given by the diagonal elements of , so that
the variance of is the first element, the variance of is the second
element, and , and the variance of is the k
th
diagonal element.


9

|
s
u u
T k
2
=

'

| s X X
2 1
( ' )

|
1

|
2

|
k
2

2
2

T
u
s
t
Calculating Parameter and Standard Error Estimates
for Multiple Regression Models: An Example
Example: The following model with k=3 is estimated over 15 observations:

and the following data have been calculated from the original Xs.



Calculate the coefficient estimates and their standard errors.
To calculate the coefficients, just multiply the matrix by the vector to obtain
.
To calculate the standard errors, we need an estimate of o
2
.


10
( ' )
. . .
. . .
. . .
, ( ' )
.
.
.
, ' . X X X y u u

(
(
=

(
(
=
1
20 35 10
35 10 65
10 65 43
30
22
06
1096
s
RSS
T k
2
1096
15 3
091 =

=

=
.
.
u x x y + + + =
3 3 2 2 1
| | |
( ) y X X X ' '
1
Calculating Parameter and Standard Error Estimates
for Multiple Regression Models: An Example (contd)
The variance-covariance matrix of is given by



The variances are on the leading diagonal:





We write:



11

|
s X X X X
2 1 1
091
183 320 091
320 091 594
091 594 393
( ' ) . ( ' )
. . .
. . .
. . .

= =

(
(
Var SE
Var SE
Var SE
(

) . (

) .
(

) . (

) .
(

) . (

) .
| |
| |
| |
1 1
2 2
3 3
183 135
0 91 0 96
393 198
= =
= =
= =
( ) ( ) ( ) 98 . 1 96 . 0 35 . 1
88 . 19 40 . 4 10 . 1
3 2 t t
x x y + =

A Special Type of Hypothesis Test: The t-ratio

Recall that the formula for a test of significance approach to hypothesis
testing using a t-test was


If the test is H
0
: |
i
= 0
H
1
: |
i
= 0
i.e. a test that the population coefficient is zero against a two-sided
alternative, this is known as a t-ratio test:

Since |
i
* = 0,

The ratio of the coefficient to its SE is known as the t-ratio or t-statistic.
12
( )
test statistic
SE
i i
i
=

*
| |
|
test stat
SE
i
i
=

(

)
|
|

The t-ratio: An Example

In the last example above:

1

2

3

Coefficient 1.10 -4.40 19.88
SE 1.35 0.96 1.98
t-ratio 0.81 -4.63 10.04

Compare this with a t
crit
with 15-3(T-k) = 12 d.f.
(2% in each tail for a 5% test) = 2.179 5%
= 3.055 1%

Do we reject H
0
: |
1
= 0? (No)
H
0
: |
2
= 0? (Yes)
H
0
: |
3
= 0? (Yes)

13

What Does the t-ratio tell us?

If we reject H
0
, we say that the result is significant. If the coefficient is not
significant (e.g. the intercept coefficient in the last regression above), then
it means that the variable is not helping to explain variations in y. Variables
that are not significant are usually removed from the regression model.
In practice there are good statistical reasons for always having a constant
even if it is not significant. Look at what happens if no intercept is included:

14

t
y
t
x
Data Mining
Data mining is the searching among many series for statistical
relationships without theoretical justification.
For example, suppose we generate one dependent variable and twenty
explanatory variables completely randomly and independently of each
other.
If we regress the dependent variable separately on each independent
variable, on average one slope coefficient will be significant at 5%.
If data mining occurs, the true significance level will be greater than the
nominal significance level.
Even path-breaking papers like Fama and French (1992) and others are
sometimes criticized for data mining!!

Resolution in financial research: Out of sample
test through international evidence or
more updated data (expansion/robustness of
the results through time)
15
Testing Multiple Hypotheses: The F-test
We used the t-test to test single hypotheses, i.e. hypotheses involving only
one coefficient. But what if we want to test more than one coefficient
simultaneously?

We do this using the F-test. The F-test involves estimating 2 regressions.

The unrestricted regression is the one in which the coefficients are freely
determined by the data, as we have done before.

The restricted regression is the one in which the coefficients are restricted,
i.e. the restrictions are imposed on some |s.

16
The F-test:
Restricted and Unrestricted Regressions

Example
The general regression is
y
t
= |
1
+ |
2
x
2t
+ |
3
x
3t
+ |
4
x
4t
+ u
t
(1)

We want to test the restriction that |
3
+|
4
= 1 (we have some hypothesis
from theory which suggests that this would be an interesting hypothesis to
study). The unrestricted regression is (1) above, but what is the restricted
regression?
y
t
= |
1
+ |
2
x
2t
+ |
3
x
3t
+ |
4
x
4t
+ u
t
s.t. |
3
+|
4
= 1

We substitute the restriction (|
3
+|
4
= 1) into the regression so that it is
automatically imposed on the data.
|
3
+|
4
= 1 |
4
= 1- |
3


17
The F-test: Forming the Restricted Regression


y
t
= |
1
+ |
2
x
2t
+ |
3
x
3t
+ (1-|
3
)x
4t
+ u
t

y
t
= |
1
+ |
2
x
2t
+ |
3
x
3t
+ x
4t
- |
3
x
4t
+ u
t


Gather terms in |s together and rearrange
(y
t
- x
4t
) = |
1
+ |
2
x
2t
+ |
3
(x
3t
- x
4t
) + u
t


This is the restricted regression. We actually estimate it by creating two new
variables, call them, say, P
t
and Q
t
.
P
t
= y
t
- x
4t

Q
t
= x
3t
- x
4t
so
P
t
= |
1
+ |
2
x
2t
+ |
3
Q
t
+ u
t
is the restricted regression we actually estimate.
18
Calculating the F-Test Statistic

The test statistic is given by



where URSS = RSS from unrestricted regression
RRSS = RSS from restricted regression
m = number of restrictions
T = number of observations
k = number of regressors in unrestricted regression
including a constant in the unrestricted regression (or the total number
of parameters to be estimated).
19
test statistic
RRSS URSS
URSS
T k
m
=



The F-Distribution
The test statistic follows the F-distribution, which has 2 d.f.
parameters.

The value of the degrees of freedom parameters are m and (T-k)
respectively (the order of the d.f. parameters is important).

The appropriate critical value will be in column m, row (T-k).

The F-distribution has only positive values and is not symmetrical. We
therefore only reject the null if the test statistic > critical F-value.

From the example: Null hypo (H
0
):
3
+
4
=1 (restricted regression)
and if F > F
critical(m,T-k)
then reject H
0
.

20
Determining the Number of Restrictions in an F-test
Examples :
H
0
: hypothesis No. of restrictions, m
|
1
+ |
2
= 2 1
|
2
= 1 and |
3
= -1 2
|
2
= 0, |
3
= 0 and |
4
= 0 3

If the model is y
t
= |
1
+ |
2
x
2t
+ |
3
x
3t
+ |
4
x
4t
+ u
t
,
then the null hypothesis
H
0
: |
2
= 0, and |
3
= 0 and |
4
= 0 is tested by the regression F-statistic. It
tests the null hypothesis that all of the coefficients except the intercept
coefficient are zero. In other words, it tests whether the model is a junk
regression. In most cases, we are dealing with this hypothesis.

Note the form of the alternative hypothesis for all tests when more than one
restriction is involved: H
1
: |
2
= 0, or |
3
= 0 or |
4
= 0
21
The Relationship between the t and the F-
Distributions

Any hypothesis which could be tested with a t-test could have been
tested using an F-test, but not the other way around.

For example, consider the hypothesis
H
0
: |
2
= 0.5
H
1
: |
2
= 0.5
We could have tested this using the usual t-test:

or it could be tested in the framework above for the F-test.

Note that the two tests always give the same result since the t-
distribution is just a special case of the F-distribution.

For example, if we have some random variable Z, and Z ~ t

(T-k) then
also Z
2
~ F(1,T-k)
22
test stat
SE
=

.
(

)
|
|
2
2
05
F-test Example

Question: Suppose a researcher wants to test whether the returns on a company
stock (y) show unit sensitivity to two factors (factor x
2
and factor x
3
) among three
considered. The regression is carried out on 144 monthly observations. The
regression is y
t
= |
1
+ |
2
x
2t
+ |
3
x
3t
+ |
4
x
4t
+ u
t

- What are the restricted and unrestricted regressions?
- If the two RSS are 436.1 and 397.2 respectively, perform the test.

Solution:
Unit sensitivity implies H
0
:|
2
=1 and |
3
=1. The unrestricted regression is the one
in the question. The restricted regression is (y
t
-x
2t
-x
3t
)= |
1
+ |
4
x
4t
+u
t
or letting
z
t
=y
t
-x
2t
-x
3t
, the restricted regression is z
t
= |
1
+ |
4
x
4t
+u
t


In the F-test formula, T=144, k=4, m=2, RRSS=436.1, URSS=397.2
F-test statistic = 6.68. Critical value is an F(2,140) = 3.07 (5%) and 4.79 (1%).

Conclusion: Reject H
0
.
23
Goodness of fit statistics for multiple regression model: Adjusted R
2


Recall that R
2
always increase as you add more variables to the regression
model. In order to get around these problems, a modification is often made
which takes into account the loss of degrees of freedom associated with
adding extra variables. This is known as , or adjusted R
2:




So if we add an extra regressor, k increases and unless R
2
increases by a
more than offsetting amount, will actually fall.

There are still problems with the criterion:
1. A soft rule
2. No distribution for or R
2



24
2
R
(

= ) 1 (
1
1
2 2
R
k T
T
R
2
R
2
R
Violations of the Assumptions of the CLRM
Recall that we assumed of the CLRM disturbance terms:

1. E(u
t
) = 0
2. Var(u
t
) = o
2
<
3. Cov (u
i
,u
j
) = 0
4. The X matrix is non-stochastic or fixed in repeated samples
5. u
t
~ N(0,o
2
)

25
Investigating Violations of the
Assumptions of the CLRM
We will now study each of these assumptions by looking at:
- Diagnostic test for violations
- Causes
- Consequences
In general we could encounter any combination of 3 problems:
- the coefficient estimates are wrong (Bias)
- the associated standard errors are wrong (Inefficiency)
- the distribution that we assumed for the test statistics will be
inappropriate (Inefficiency)
- Remedy

26
Statistical Distributions for Diagnostic Tests
Often, an F- and a _
2
- version of the test are available.

The F-test version involves estimating a restricted and an unrestricted
version of a test regression and comparing the RSS.

The _
2
- version is sometimes called an LM test, and only has one degree
of freedom parameter: the number of restrictions being tested, m.

Asymptotically, the 2 tests are equivalent since the _
2
is a special case of the
F-distribution:


For small samples, the F-version is preferable.
27
( )
( ) k T k T m F
m
m
as ,
2
_

28
Violating Assumption 1: E(u
t
) = 0
Assump #1: The average value of errors(disturbances) is zero
Theme: The exclusion of intercept term

Diagnostic test: - None

Causes: - The exclusion of intercept (constant) term (Y is forced to be zero with
zero-value of X)
Consequences: 1) Biases in slope coefficient estimates
2) R-square and Adjusted R-square are becoming meaningless
3) e.g. You may get a negative R-square!
Remedy: - Include intercept term unless finance theories suggest otherwise

Violating Assumption 2: Var(u
t
) = o
2
<
Assump #2: The variance of the errors is constant
Theme: Heteroskedasticity
We have so far assumed that the variance of the errors is constant, o
2
- this is known as
homoskedasticity. If the errors do not have a constant variance, we say that they are
heteroskedastic e.g. say we estimate a regression and calculate the residuals, .

29
u
t

t
u +
-
t
x
2

Detection of Heteroskedasticity

Graphical methods
Formal tests:
One of the best is Whites general test for heteroskedasticity.

The test is carried out as follows:
1. Assume that the regression we carried out is as follows
y
t
= |
1
+ |
2
x
2t
+ |
3
x
3t
+ u
t

And we want to test Var(u
t
) = o
2
. We estimate the model, obtaining the
residuals,

2. Then run the auxiliary regression

30
u
t
t t t t t t t t
v x x x x x x u + + + + + + =
3 2 6
2
3 5
2
2 4 3 3 2 2 1
2
o o o o o o
Performing Whites Test for Heteroskedasticity


3. Obtain R
2
from the auxiliary regression and multiply it by the number of
observations, T. It can be shown that
T R
2
~ _
2
(m)
where m is the number of regressors in the auxiliary regression excluding
the constant term.

4. If the _
2
test statistic from step 3 is greater than the corresponding value
from the statistical table then reject the null hypothesis that the disturbances
are homoskedastic.


31

Consequences of Using OLS in the Presence of
Heteroskedasticity
OLS estimation still gives unbiased coefficient estimates, but they are no
longer BLUE.

This implies that if we still use OLS in the presence of heteroskedasticity,
our standard errors could be inappropriate and hence any inferences we
make could be misleading.

Whether the standard errors calculated using the usual formulae are too big
or too small will depend upon the form of the heteroskedasticity.

32

How Do we Deal with Heteroskedasticity?

If the form (i.e. the cause) of the heteroskedasticity is known, then we can
use an estimation method which takes this into account (called generalised
least squares, GLS).
A simple illustration of GLS is as follows: Suppose that the error variance is
related to another variable z
t
by

To remove the heteroskedasticity, divide the regression equation by z
t



where is an error term.

Now for known z
t
.

33
( )
2 2
var
t t
z u o =
t
t
t
t
t
t t
t
v
z
x
z
x
z z
y
+ + + =
3
3
2
2 1
1
| | |
t
t
t
z
u
v =
( )
( )
2
2
2 2
2
var
var var o
o
= = =
|
|
.
|

\
|
=
t
t
t
t
t
t
t
z
z
z
u
z
u
v

Other Approaches to Dealing
with Heteroscedasticity
So the disturbances from the new regression equation will be
homoskedastic.

Other solutions include:
1. Transforming the variables into logs or reducing by some other measure
of size.
2. Use Whites heteroskedasticity consistent standard error estimates.
The effect of using Whites correction is that in general the standard errors
for the slope coefficients are increased relative to the usual OLS standard
errors. This makes us more conservative in hypothesis testing, so that we
would need more evidence against the null hypothesis before we would
reject it.

Practice in financial researches: No.2 above plus the use of Newey-West
Standard Errors (also available in E-Views)
34
SAS application
Following on from the previous example on CAPM model for BP stock, in the SAS
program BP_CAPM_SAS, add these command lines on:

/* To perform White's test on Heteroskedasticity */
PROC MODEL;
PARMS B0 B1;
rbp = B0 + B1*rm;
FIT rbp /WHITE;
RUN;

- Save the program, then click the small man icon to run the program (make sure
you have the bps107 data file imported and run the program since the
beginning as we did in Module 2.1).
35
Youll have this in the output panel. The number 0.99 would be
the White statistics.
36

E-Views application
First, plot the residuals to see if there is a systematic pattern visually.
Based on BP CAPM example earlier.
Run E-Views application. Click File, Open, and Workfile. Then, specify the
file BP CAPM that you have saved earlier. Youll have:


37


Double click on = CAPM series and youll see the regression output that we have
from earlier. Then, click on View, Actual, Fitted, Residual, and Actual, Fitted,
Residual Graph:

38


Youll get the following graph:















Do you see any systematic pattern in residuals with changing in X over the sample?

39


Now, lets get on with the formal White test: On = CAPM series, click View,
Residual Tests, White Heteroskedasticity (no cross terms);















40


Youll get the following results from the auxiliary regression;
















41


Results:

- The test statistics is T.R
2
, which is given as 0.9868 while the critical value for Chi-
square(m=3) is 7.815. Thus we CANNOT reject the null hypothesis of hemoskedasticity in
this case.

- In other words, it is somewhat plausible to assume that the variance of the errors is constant
(can be less worried about heteroskedasticity in the BP CAPM example).

- This is also consistent with the fact that there is no significant coefficient for RFTAS or
RFTAS^2 in the auxiliary regression. That is there is no systematic relationship between the
squared residual of the regression and explanatory variable(s), in both linear and quadratic
form.














42


Newey-west t-statistics:
- In any case, with or without White tests, some financial researchers just present the Newey-
West t-statistics in their results.
- Newey-West t-statistics takes into account the problem of heteroskedasticity as well as
autocorrelation!
- In general (most cases), Newey-West t-statistics presents a conservative number as it will be
lower than normal OLS t-statistics.
- You can easily obtain Newey-West t-statistics in E-Views. From previous example in Module
2.1 (CAPM on BP example). Open such e-views workfile again. You should have..













43



Choose objects new object, Equation. Then give it a name in dialog box (say newcapm),
then OK.













44



Repeating the same step as we do when creating a regression earlier. But this time, do
not click Ok. Click Options instead.












45



Then choose Heteroskedasticity Consistent Coefficient Covariance, Newey-West, and OK.
Youll get the output as in next page;













46











































































- Notice that now t-value has decreased to 5.18 ( = 5.85 previously)!!
- Newey West on SAS is tricky and not required for 125.785 students.












47

Violating Assumption 3: Cov (u
i
, u
j
) = 0
Assump #3: The covariance between the error terms over time
(or cross- sectionally) is zero
Theme: Autocorrelation

We assumed of the CLRMs errors that Cov (u
i
, u
j
) = 0 for i=j, i.e.
This is essentially the same as saying there is no pattern in the errors.

Obviously we never have the actual us, so we use their sample counterpart,
the residuals.

If there are patterns in the residuals from a model, we say that they are
autocorrelated.

Some stereotypical patterns we may find in the residuals are given on the
next 3 slides.
48
u
t
Positive Autocorrelation










Positive Autocorrelation is indicated by a cyclical residual plot over time.
49
+
-
-
t
u
+
1

t
u
-3.7
-6
-6.5
-6
-3.1
-5
-3
0.5
-1
1
4
3
5
7
8
7
+
-
Time
t
u
Negative Autocorrelation









Negative autocorrelation is indicated by an alternating pattern where the residuals
cross the time axis more frequently than if they were distributed randomly
50
+
-
-
t
u
+
1

t
u
+
-
t
u
Time
No pattern in residuals
No autocorrelation










No pattern in residuals at all: this is what we would like to see
51
+
t
u
-
-
+
1

t
u
+
-
t
u
Detecting Autocorrelation:
The Durbin-Watson Test

The Durbin-Watson (DW) is a test for first order
autocorrelation - i.e. it assumes that the relationship is between an error
and the previous one
u
t
= u
t-1
+ v
t
(1)
where v
t
~ N(0, o
v
2
).
The DW test statistic actually tests
H
0
: =0 and H
1
: =0
The test statistic is calculated by



52
( )
DW
u u
u
t t
t
T
t
t
T
=


=
=

1
2
2
2
2
The Durbin-Watson Test:
Critical Values
We can also write
(2)

where is the estimated correlation coefficient. Since is a
correlation, it implies that .
Rearranging for DW from (2) would give 0sDWs4.

If = 0, DW = 2. So roughly speaking, do not reject the null
hypothesis if DW is near 2 i.e. there is little evidence of
autocorrelation

Unfortunately, DW has 2 critical values, an upper critical value (d
u
)
and a lower critical value (d
L
), and there is also an intermediate region
where we can neither reject nor not reject H
0
. (critical value can be
obtained from Table D.5 p.888 - 891 Gujaratis textbook)
53
DW~ 2 1 ( )


1 1 s s p

The Durbin-Watson Test: Interpreting the Results











Conditions which Must be Fulfilled for DW to be a Valid Test
1. Constant term in regression
2. Regressors are non-stochastic
3. No lags of dependent variable

54

Another Test for Autocorrelation:
The Breusch-Godfrey Test

It is a more general test for r
th
order autocorrelation:
~N(0, )
The null and alternative hypotheses are:
H
0
:
1
= 0 and
2
= 0 and ... and
r
= 0
H
1
:
1
= 0 or
2
= 0 or ... or
r
= 0
The test is carried out as follows:
1. Estimate the linear regression using OLS and obtain the residuals, .

2. Regress on all of the regressors from stage 1 (the xs) plus
Obtain R
2
from this regression.
3. It can be shown that (T-r)R
2
~ _
2
(r)
If the test statistic exceeds the critical value from the statistical tables, reject
the null (joint) hypothesis of no autocorrelation (that is we might need to be
aware of autocorrelaltion problem).

55
u
t
u
t
u u u u u v v
t t t t r t r t t
= + + + + +


1 1 2 2 3 3
... ,
, ,..., u u u
t t t r 1 2
2
v
o

Consequences of Ignoring Autocorrelation
if it is Present

The coefficient estimates derived using OLS are still unbiased, but they are
inefficient, i.e. they are not BLUE, even in large sample sizes.

Thus, if the standard error estimates are inappropriate, there exists the
possibility that we could make the wrong inferences (e.g. SE of estimated is
biased downwards and thus upwardly biased t-stat, in case of positive autocorrelation).

R
2
is likely to be inflated relative to its correct value for positively
correlated residuals.


56

Remedies for Autocorrelation

Time series analyses (covered more extensively in Module 4 by Dr. JG Chen.

If the form of the autocorrelation is known, we could use a GLS procedure
i.e. an approach that allows for autocorrelated residuals e.g., Cochrane-Orcutt.
But such procedures that correct for autocorrelation require assumptions
about the form of the autocorrelation.
If these assumptions are invalid, the cure would be more dangerous than the
disease! - see Hendry and Mizon (1978).

However, it is unlikely to be the case that the form of the autocorrelation is
known, and a more modern view is that residual autocorrelation presents an
opportunity to modify the regression (e.g. lagged regressors in time series).

Practice in financial researches: Use Newey-West Standard Errors (it
deals with both heteroskedasticity and autocorrelation)


57

Problems with Adding Lagged Regressors
to Cure Autocorrelation
Inclusion of lagged values of the dependent variable violates the
assumption that the RHS variables are non-stochastic.

What does an equation with a large number of lags actually mean?

Note that if there is still autocorrelation in the residuals of a model
including lags, then the OLS estimators will not even be consistent.

58
SAS application
Continue on in the SAS program BP_CAPM_SAS, add these
command lines:

/* To perform Durbin Watson test */

proc reg data=capm_bp;
model rbp = rm/dw;
run;

- Save the program, then click the small man icon to run the
program (you may now highlight just the above lines).

59
The results will be this.. The number 1.876 is Durbin Watson statistics.
60

E-Views application
By default, E-views gives us Durbin-Watson statistics when we run
OLS regression. From the previous BP-CAPM example, we had:













Now, since Durbin-Watson statistics is 1.88 (approaching 2), there is little
evidence that first-order autocorrelation problem exist in this example.


61
To be more conservative, you can try Breusch-Godfrey test to
investigate autocorrelation in residuals beyond the first order.
In this case, lets try the BP-CAPM example up to 10 lags.
Double click on = CAPM series again. Then, click View,
Residuals Test, but this time choose Serial Correlation LM test.
The small box will appear asking you the number of lags, put 10 and
OK. Youll get the results in the following page.

The test statistics is 20.33 higher than the (5% sig. level) critical value of
18.31obtained from the table (Chi-square(10)), meaning that we have to reject
the null hypothesis of no autocorrelation in any lag of residual terms. The
Probability also gives away such information anyway.
Further look tells us that even though there is no autocorrelation in the first
order as correctly stated by Durbin-Watson test, there exist some
autocorrelation in the 2
nd
and 5
th
order, for example.









62













63
SAS application
- Continue on with the same SAS program. Add
these command lines:

/* To perform Breusch Godfrey test allowing for 10 lags */

proc autoreg data=capm_bp;
model rbp=rm/godfrey=10;
run;

- Save the program, then click the small man icon to run the program (you
may now highlight just the above lines).

64
Youll get this as the output. With 10 lags, BG statistics is 20.33. Notice that
when you do this, dw statistics is given too anyway.
65
Violating Assumption 4: Cov (u
t
, X
t
) = 0
Assump #4: Explanatory variables are non-stochastic
Theme: Endogeneity problem or Simultaneity
bias
Diagnostic test: - Hausman Test or Suggestions from well-documented finance theories

Causes: In CLRM, we assume that explanatory variables, Xs, are non-stochastic (non-random).
That is they are fixed in repeated sample. Thus, they should be determined outside the model
(exogeneous). If this is violated, there exists endogeneity problem.

Consequences: - Biases in coefficient estimates

Remedy: - Two Stage Least Square (2SLS), Instrumental variable regression (IV regressions),
Vector Autoregressive Model (VAR).

Note: Multivariate models are not covered in our paper. Nevertheless, we strongly suggest
students read Chapters 18-20 in Gujaratis or Chapter 6 in Brooks for own knowledge..


66
Violating Assumption 5: u
t
are normally
distributed
Assump #5: the error terms are normally distributed
Theme: Non normality
Diagnostic test: - Bera-Jarque test
Causes: Mostly, 1) Sample size are too small
2) There outliers in the data

Consequences: - Single or joint hypothesis tests about the model parameter
may be invalid!
Remedy: 1) Increase sample size or non-parametric statistics if sample size is small
2) Remove outlier effects through dummy variables (discussed in the
second half of the semester) or data filtering (safer to follow renown
papers in the same area). The typical dummy variable in investment
research is Oct 1987 stock market crash. (R
2
may improve too!)
3) In finance studies (esp. involving returns, the more frequency of the
data (e.g. daily returns instead of monthly returns, the higher chance
it would be normal.
67

Testing the Normality Assumption

Why did we need to assume normality for hypothesis testing?

Testing for Departures from Normality

The Bera Jarque normality test
A normal distribution is not skewed and is defined to have a coefficient of
kurtosis of 3.
The kurtosis of the normal distribution is 3 so its excess kurtosis (b
2
-3) is
zero.
Skewness and kurtosis are the (standardised) third and fourth moments of a
distribution.

68
Normal versus Skewed Distributions










A normal distribution A skewed distribution

69
f(x)
x
x
f(x)
Leptokurtic versus Normal Distribution
(financial return series tend to be the leptokurtic (the taller one)

70
-5.4 -3.6 -1.8 -0.0 1.8 3.6 5.4
0.0
0.1
0.2
0.3
0.4
0.5
Testing for Normality

The Bera and Jarque formalise this by testing the residuals for normality by
testing whether the coefficient of skewness and the coefficient of excess
kurtosis are jointly zero.
It can be proved that the coefficients of skewness and kurtosis can be
expressed respectively as:
and


The Bera Jarque test statistic is given by


We estimate b
1
and b
2
using the residuals from the OLS regression, .
71
u
( )
b
E u
1
3
2
3 2
=
[ ]
/
o
( )
b
E u
2
4
2
2
=
[ ]
o
( )
( ) 2 ~
24
3
6
2
2
2
2
1
_
(


+ =
b b
T W
SAS application
- Continue on with the same SAS program. Add these
command lines:

/* To perform Jarque-Bera normality test*/

proc autoreg data=capm_bp;
model rbp=rm/normal;
run;

- Save the program, then click the small man icon to run the
program (you may now highlight just the above lines).

72
And the output..33.89
73
E-Views application

Lets perform Bera and Jarque Normality test on our previous BP-CAPM
example through View, Residual tests, Histogram-Normality Test.
Well get:

74


Now, the higher the Jarque-Bera statistics, the stronger it shows that residual
terms are NOT normally distributed. Looking at the Probability would also
yield the answer. (Recall that the null hypothesis is normality)
In BP-CAPM case, the histogram is not exactly bell-shaped. The Jarque-Bera
statistics is high meaning that the skewness and kurtosis coefficients are not
jointly zero. The null hypothesis of normality needed to be rejected.
That is one might need to be aware of normality problem in the particular
BP-CAPM market model.

75
u
Implicit CLRM Assumptions and their
violations
Besides five formal CLRM assumptions, there are about five more implicit
assumptions of CLRM. Violations on these conditions are also considered serious
problems in econometrics

Five implicit CLRM assumptions:
1) Explanatory variables are not correlated to one another - Multicollinearity
problem
2) The appropriate functional form of the regression is linear - Misspecification
problem
3) There is no omission of an important explanatory variable - Omitted variable
problem (also a misspecification problem)
4) There is no inclusion of irrelevant variables - Torturing regression problem (Jeff s
terminology)
5) The regression coefficient parameters are constant through the entire study
period - Parameter Stability problem (sub-period robustness)

76
Violating Implicit Assumption 1:
Explanatory variables are not correlated to one another
Theme: Multicollinearity

Diagnostic test: -Pairwise Correlation Matrix (although informal) or hints from
previous empirical findings in the field
Causes: .Some explanatory variables are correlated to each other

Consequences: - High R
2
but insignificant coefficients!
- Regression model is very sensitive to adding or removing
explanatory variables.

Remedy: - discussed in later slide

77

Measuring Multicollinearity

The easiest way to measure the extent of multicollinearity is simply to
look at the matrix of correlations between the individual variables. e.g.





But another problem: if 3 or more variables are linear
- e.g. x
2t
+ x
3t
= x
4t


Note that high correlation between y and one of the xs is not
muticollinearity.


78
Corr x
2
x
3
x
4
x
2
- 0.2 0.8
x
3
0.2 - 0.3
x
4
0.8 0.3 -

Multicollinearity

This problem occurs when the explanatory variables are very highly correlated
with each other.

Perfect multicollinearity
Cannot estimate all the coefficients
- e.g. suppose x
3
= 2x
2

and the model is y
t
= |
1
+ |
2
x
2t
+ |
3
x
3t
+ |
4
x
4t
+ u
t


Problems if Near Multicollinearity is Present but Ignored
- R
2
will be high but the individual coefficients will have high standard errors.
- The regression becomes very sensitive to small changes in the specification.
- Thus confidence intervals for the parameters will be very wide, and
significance tests might therefore give inappropriate conclusions.
79

Remedies to the Problem of Multicollinearity

Traditional approaches, such as ridge regression or principal components. But
these usually bring more problems than they solve.
Some econometricians argue that if the model is otherwise OK, just ignore it
The easiest ways to cure the problems are
- drop one of the collinear variables
- transform the highly correlated variables into a ratio
- go out and collect more data e.g.
- a longer run of data
- switch to a higher frequency

- I n Financial Researches: Use Panel dataset (special and
widely used techniques in finance will be discussed in Module 2.3).
Sometimes, Principal Component Analysis is used (e.g. Chen, Roll, and
Rosss APT pricing model). We can also perform multiple versions of
regression models to see the sensitivity of regression coefficients (e.g. like
Rozeff(1982)).

80

Violating Implicit Assumption 2:
Adopting the Wrong Functional Form

Theme: Misspecification

Diagnostic test: - Ramseys RESET test, which is a general test for mis-specification of
functional form

Causes: In ordinary OLS, we assume that the appropriate functional form is linear while
actually it may not be true.

Consequences: - Biased estimates altogether.

Remedy: - Use non-linear models if doable and supported by finance theories

81
Ramseys RESET test
Essentially the method works by adding higher order terms of the fitted
values (e.g. etc.) into an auxiliary regression:
Regress on powers of the fitted values:

Obtain R
2
from this regression. The test statistic is given by TR
2
and is
distributed as a .

So if the value of the test statistic is greater than a
then reject the null hypothesis that the functional form was correct.


82
, y y
t t
2 3
... u y y y v
t t t p t
p
t
= + + + + +

| | | |
0 1
2
2
3
1
_
2
1 ( ) p
_
2
1 ( ) p
Violating Implicit Assumption 3:
Omission of important variable(s)

Theme: Omitted Variables
Diagnostic test: - None
Cause: .Some explanatory variable(s) is(are) missing from the regression model

Consequences: - Possibly somewhat significant coefficients but low R
2

- Biases and inconsistency in coefficient estimates unless the excluded
variable is uncorrelated with all the included variables.
- The intercept term will be biased (unconditionally)
- Std. Error of estimated coefficient may be too large
Remedy: - In finance, we include controlled variables, the variables that are known (from
previous studies) to affect the dependant variable. (For example, if we are interested in
testing the inverse relation between M/B ratio and Firm Age, then we should include other
variables/factors that determine M/B ratio of a firm (but ideally not so highly correlated with
Firm Age) including Total Asset, Debt ratio, Industry, and others. If you are interested in
explaining bid ask spread, you need to control for activity, competition, risk, and
information variables. )

83
Violating Implicit Assumption 4:
Inclusion of irrelevant variable(s)

Theme: Torturing Regression (Jeffs terminology)

Diagnostic test: - None (regression with so many explanatory variables raises eyebrows)
Cause: .Unnecessary explanatory variable(s) is(are) thrown into a regression model

Consequences: - Std. Error of estimated coefficient is larger than it really is (Estimates
are consistent, unbiased, BUT inefficient in Econometrics term)
- As a result, some marginally significant variables will lose their
significance altogether.


Remedy: - Naturally, try to keep the number of explanatory variables reasonable and to only
those supported by finance theories
84
Violating Implicit Assumption 5:
regression coefficient parameters are constant throughout the entire study
period

Theme: Parameter Stability or Sub-period
analysis
Diagnostic test: - Chow test, Predictive failure test
Cause: - Time-varying relationship especially with structural shifts

Consequences: - The regression results may be biased and not justifiable since they are
not robust through time.
- The study may also be subject to Data Mining critique (e.g.
authors select to report only certain short time period that
advocates the model) without robustness check

Remedy: - Perform Chow test as a robustness check
- Provide sub-period results (becoming more common in empirical finance)


85

Parameter Stability Tests

So far, we have estimated regressions such as

We have implicitly assumed that the parameters (|
1
, |
2
and |
3
) are
constant for the entire sample period.

We can test this implicit assumption using parameter stability tests. The
idea is essentially to split the data into sub-periods and then to estimate up
to three models, for each of the sub-parts and for all the data and then to
compare the RSS of the models.

There are two types of test we can look at:
- Chow test (analysis of variance test)
- Predictive failure tests

86
y
t
= |
1
+ |
2
x
2t
+ |
3
x
3t
+ u
t

The Chow Test
The steps involved are:
1. Split the data into two sub-periods. Estimate the regression over the whole
period and then for the two sub-periods separately (3 regressions). Obtain the
RSS for each regression.
2. The restricted regression is now the regression for the whole period while
the unrestricted regression comes in two parts: for each of the sub-samples.
We can thus form an F-test which is the difference between the RSSs.

The statistic is



[ Recall F-test statistics: ]
87
( ) RSS RSS RSS
RSS RSS
T k
k
+
+


1 2
1 2
2
test statistic
RRSS URSS
URSS
T k
m
=



The Chow Test (contd)

where:
RSS = RSS for whole sample
RSS
1
= RSS for sub-sample 1
RSS
2
= RSS for sub-sample 2
T = number of observations
2k = number of regressors in the unrestricted regression (since it comes
in two parts)
k = number of regressors in (each part of the) unrestricted regression

3. Perform the test. If the value of the test statistic is greater than the
critical value from the F-distribution, which is an F(k, T-2k), then reject
the null hypothesis that the parameters are stable over time.


88

A Chow Test Example

Consider the following regression for the CAPM | (again) for the
returns on Glaxo.

Say that we are interested in estimating Beta for monthly data from
1981-1992. The model for each sub-period is

1981M1 - 1987M10
0.24 + 1.2R
Mt
T = 82 RSS
1
= 0.03555
1987M11 - 1992M12
0.68 + 1.53R
Mt
T = 62 RSS
2
= 0.00336
1981M1 - 1992M12
0.39 + 1.37R
Mt
T = 144 RSS = 0.0434

89
A Chow Test Example - Results

The null hypothesis is


The unrestricted model is the model where this restriction is not imposed



= 7.698

Compare with 5% F(2,140) = 3.06

We reject H
0
at the 5% level and say that we reject the restriction that the
coefficients are the same in the two periods.
90
H and
0 1 2 1 2
: o o | | = =
( )
Test statistic =
+
+

00434 00355 000336
00355 000336
144 4
2
. . .
. .

The Predictive Failure Test

Problem with the Chow test is that we need to have enough data to do the
regression on both sub-samples, i.e. T
1
>>k, T
2
>>k.
An alternative formulation is the predictive failure test.
What we do with the predictive failure test is estimate the regression over a long
sub-period (i.e. most of the data) and then we predict values for the other period
and compare the two.
To calculate the test:
- Run the regression for the whole period (the restricted regression) and obtain the RSS
- Run the regression for the large sub-period and obtain the RSS (called RSS
1
). Note
we call the number of observations T
1
(even though it may come second).


where T
2
= number of observations we are attempting to predict. The test statistic
will follow an F(T
2
, T
1
-k).
91
2
1
1
1
Statistic Test
T
k T
RSS
RSS RSS

=
Backwards versus Forwards Predictive Failure Tests

There are 2 types of predictive failure tests:

- Forward predictive failure tests, where we keep the last few observations
back for forecast testing, e.g. we have observations for 1970Q1-1994Q4.
So estimate the model over 1970Q1-1993Q4 and forecast 1994Q1-1994Q4.

- Backward predictive failure tests, where we attempt to back-cast the
first few observations, e.g. if we have data for 1970Q1-1994Q4, and we
estimate the model over 1971Q1-1994Q4 and backcast 1970Q1-1970Q4.

92

Predictive Failure Tests An Example

We have the following models estimated:
For the CAPM | on Glaxo(!).
1980M1-1991M12
0.39 + 1.37R
Mt
T = 144 RSS = 0.0434
1980M1-1989M12
0.32 + 1.31R
Mt
T
1
= 120 RSS
1
= 0.0420
Can this regression adequately forecast the values for the last two years?

= 0.164

Compare with F(24,118) = 1.66.
So we do not reject the null hypothesis that the model can adequately
predict the last few observations.
93
24
2 120
0420 . 0
0420 . 0 0434 . 0
Statistic Test

=

How do we decide the sub-parts to use?

As a rule of thumb, we could use all or some of the following:
- Plot the dependent variable over time and split the data accordingly to any
obvious structural changes in the series, e.g.







- Split the data according to any known important
historical events (e.g. stock market crash, new government elected)
- Use all but the last few observations and do a predictive failure test on those.
94
0
200
400
600
800
1000
1200
1400
1
2
7
5
3
7
9
1
0
5
1
3
1
1
5
7
1
8
3
2
0
9
2
3
5
2
6
1
2
8
7
3
1
3
3
3
9
3
6
5
3
9
1
4
1
7
4
4
3
Sample Period
V
a
l
u
e

o
f

S
e
r
i
e
s

(
y
t
)


A classic example of multiple regression:
Determinants of Dividend Payout Ratios

Rozeff (JFR 1982)
- to be discussed in the class (internal)
- to be discussed at the block course (block
students)
95

You might also like