You are on page 1of 20

University of Hong Kong

Introductory Econometrics (ECON0701), Spring 2014


24 February 2014

Administrative Matters

You are probably wondering about the scope of the midterm examination, which will
be held on Monday, March 17
th
.
It will cover chapters 2, 3, 4, parts of 6, and 7 (i.e., up to Monday, March 3
rd
; chapter 6
is only to the extent discussed in lecture).
Also, you may have with you one A4 sheet of notes, handwritten on one side.
The definition of handwritten is: the note sheet must be produced with no other
technology than a pen or pencil.

Multiple Regression Analysis: Inference

Last time we discussed testing hypotheses that involve more than one parameter.
The method is to define a third parameter that is zero then the hypothesis you want to
test is true. For example, this is how you would pick a parameter to test |
i
=|
j
.



Then manipulate the equation so you can estimate u directly, and test the hypothesis
that that third parameter is equal to zero.

Multiple Regression Analysis: Inference

For example, consider the model



and we want to test the hypothesis that |
1
=|
2
.
Substitute u=|
1
-|
2
for |
1
(|
1
=u +|
2
):






i j
u | | =
0 1 1 2 2
y x x u | | | = + + +
( )
0 2 1 2 2
y x x u | u | | = + + + +
Multiple Regression Analysis: Inference

For a second example, we will look at campaign finance expenditures.
The research question is if there is evidence that candidates expenditures exactly
offset each other.
The specific hypothesis is that if candidate A increases her spending by some
proportion, and candidate B increases his spending the same proportion, the result
would be the same as if neither had increased their expenditures.

Multiple Regression Analysis: Inference

The specific model we have in mind is



where voteA is the percent of the vote received by candidate A, expendA and expendB
are As and Bs campaign expenditures, and partystrA is a measure of the strength of
As party (the percent voting for As party in the last election).

Multiple Regression Analysis: Inference
. d

Cont ai ns dat a f r omvot e1. dt a
obs: 173
var s: 10 25 J un 1999 14: 07
si ze: 5, 190 ( 99. 3%of memor y f r ee)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
st or age di spl ay val ue
var i abl e name t ype f or mat l abel var i abl e l abel
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
vot eA byt e %5. 2f per cent vot e f or A
expendA f l oat %8. 2f camp. expends. by A, $1000s
expendB f l oat %8. 2f camp. expends. by B, $1000s
pr t yst r A byt e %5. 2f %vot e f or pr esi dent
l expendA f l oat %9. 0g l og( expendA)
l expendB f l oat %9. 0g l og( expendB)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Sor t ed by:







( ) ( )
0 1 2 3
ln ln voteA expendA expendB partystrA u | | | | = + + + +
Multiple Regression Analysis: Inference

First lets take a look at the regression result.

. r egr ess vot eA l expendA l expendB pr t yst r A

Sour ce | SS df MS Number of obs = 173
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 3, 169) = 215. 23
Model | 38405. 1096 3 12801. 7032 Pr ob > F = 0. 0000
Resi dual | 10052. 1389 169 59. 480112 R- squar ed = 0. 7926
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- squar ed = 0. 7889
Tot al | 48457. 2486 172 281. 728189 Root MSE = 7. 7123

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
vot eA | Coef . St d. Er r . t P>| t | [ 95%Conf . I nt er val ]
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
l expendA | 6. 083316 . 38215 15. 92 0. 000 5. 328914 6. 837719
l expendB | - 6. 615417 . 3788203 - 17. 46 0. 000 - 7. 363246 - 5. 867588
pr t yst r A | . 1519574 . 0620181 2. 45 0. 015 . 0295274 . 2743873
_cons | 45. 07893 3. 926305 11. 48 0. 000 37. 32801 52. 82985
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Multiple Regression Analysis: Inference

The hypothesis that we want to test is that proportional increases in spending by
candidate B exactly offset proportional increases in spending by candidate A i.e.,
that |
2
=-|
1
.
If this is true, then |
1
+|
2
=0.
Therefore, let the parameter u equal |
1
+|
2
, and substitute |
1
=u-|
2
.

Multiple Regression Analysis: Inference















1 2
1 2
u | |
| u |
= +
=
( ) ( )
0 1 2 3
ln ln voteA expendA expendB partystrA u | | | | = + + + +
( ) ( ) ( )
0 2 2 3
ln ln voteA expendA expendB partystrA u | u | | | = + + + +
( ) ( ) ( ) ( )
0 2 3
ln ln ln voteA expendA expendB expendA partystrA u | u | | = + + + +
Multiple Regression Analysis: Inference

We can run this regression by creating a new variable lexpendBA which is equal to
lexpendB lexpendA:


. r egr ess vot eA l expendA l expendBA pr t yst r A

Sour ce | SS df MS Number of obs = 173
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 3, 169) = 215. 23
Model | 38405. 1097 3 12801. 7032 Pr ob > F = 0. 0000
Resi dual | 10052. 1388 169 59. 4801115 R- squar ed = 0. 7926
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- squar ed = 0. 7889
Tot al | 48457. 2486 172 281. 728189 Root MSE = 7. 7123

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
vot eA | Coef . St d. Er r . t P>| t | [ 95%Conf . I nt er val ]
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
l expendA | - . 532101 . 5330858 - 1. 00 0. 320 - 1. 584466 . 5202638
l expendBA | - 6. 615417 . 3788203 - 17. 46 0. 000 - 7. 363246 - 5. 867588
pr t yst r A | . 1519574 . 0620181 2. 45 0. 015 . 0295274 . 2743873
_cons | 45. 07893 3. 926305 11. 48 0. 000 37. 32801 52. 82985
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Multiple Regression Analysis: Inference

This suggests the following general strategy for testing a hypothesis that is a
function of more than one parameter.
Define a parameter u that is equal to zero when the hypothesis cant be rejected.
For example, if the hypothesis is |
1
+|
2
=3, subtract the 3 from both sides and
define u=|
1
+|
2
-3.
Isolate one parameter in terms of u and the other parameters; for example |
1
=u-
|
2
+3.

Multiple Regression Analysis: Inference

Substitute for the parameter you isolated (in this example, |
1
) in the regression
equation.
This will suggest another regression to run where u is the coefficient on one of the
variables and the remaining | parameters are coefficients on the transformed
variables.
Run that regression and see if you can reject the hypothesis that u=0. If you can,
you can reject the original hypothesis, since we picked u to be equal to zero when
the original hypothesis is true.



Multiple Regression Analysis: Inference

This same idea can be used to test hypotheses about the predicted value from a
regression.
For example, suppose we use a multiple regression model to predict students average
GPA, given their SAT scores and some characteristics of their high school.
How accurate is this prediction?
The idea is to define a parameter that will be equal to the prediction, given certain
values of the independent variables, and then we have all the information we need (the
standard error) to do hypothesis testing about the prediction.

Multiple Regression Analysis: Inference

For example, suppose we have the following linear regression model for GPA in
college:



Where colGPA is the students college GPA, SAT is the students SAT score, hsperc is
the students percentile ranking in high school (i.e., 1=top 1%), and hsize is the size of
the students high school, in hundreds.
Multiple Regression Analysis: Inference

. d

Cont ai ns dat a f r omgpa2. dt a
obs: 4, 137
var s: 12 25 May 2002 14: 39
si ze: 157, 206 ( 98. 5%of memor y f r ee)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
st or age di spl ay val ue
var i abl e name t ype f or mat l abel var i abl e l abel
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sat i nt %10. 0g combi ned SAT scor e
t ot hr s i nt %10. 0g t ot al hour s t hr ough f al l semest
col gpa f l oat %9. 0g GPA af t er f al l semest er
at hl et e byt e %8. 0g =1 i f at hl et e
ver bmat h f l oat %9. 0g ver bal / mat h SAT scor e
hsi ze doubl e %10. 0g si ze gr ad. cl ass, 100s
hsr ank i nt %10. 0g r ank i n gr ad. cl ass
hsper c f l oat %9. 0g hi gh school per cent i l e, f r omt op
f emal e byt e %9. 0g =1 i f f emal e
whi t e byt e %9. 0g =1 i f whi t e
bl ack byt e %9. 0g =1 i f bl ack
hsi zesq f l oat %9. 0g hsi ze^2
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Sor t ed by:



2
0 1 2 3 4
colGPA SAT hsperc hsize hsize u | | | | | = + + + + +
Multiple Regression Analysis: Inference


. r egr ess col gpa sat hsper c hsi ze hsi zesq

Sour ce | SS df MS Number of obs = 4137
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 4, 4132) = 398. 02
Model | 499. 030504 4 124. 757626 Pr ob > F = 0. 0000
Resi dual | 1295. 16517 4132 . 313447524 R- squar ed = 0. 2781
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- squar ed = 0. 2774
Tot al | 1794. 19567 4136 . 433799728 Root MSE = . 55986

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
col gpa | Coef . St d. Er r . t P>| t | [ 95%Conf . I nt er val ]
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sat | . 0014925 . 0000652 22. 89 0. 000 . 0013646 . 0016204
hsper c | - . 0138558 . 000561 - 24. 70 0. 000 - . 0149557 - . 0127559
hsi ze | - . 0608815 . 0165012 - 3. 69 0. 000 - . 0932328 - . 0285302
hsi zesq | . 0054603 . 0022698 2. 41 0. 016 . 0010102 . 0099104
_cons | 1. 492652 . 0753414 19. 81 0. 000 1. 344942 1. 640362
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Multiple Regression Analysis: Inference

Now, what if we want to construct a confidence interval for the expected GPA of a
student with 1200 on the SAT, in the 30
th
percentile of his high school class, from a
high school of 500 students?
This time,





Multiple Regression Analysis: Inference

Now, substitute for |
0
in the original regression equation:









This suggests that you can find u by regressing colGPA on (SAT-1200), (hsperc-30),
(hsize-5), and (hsize
2
-5
2
).


2
0 1 2 3 4
2
0 1 2 3 4
1200 30 5 5
1200 30 5 5
u | | | | |
| u | | | |
= + + + +
=


( ) ( )
( )
2
0 1 2 3 4
2
1 2 3 4
2
1 2 3 4
1 2
3 4
1200 30 5 5

1200 30
5
colGPA SAT hsperc hsize hsize u
colGPA
SAT hsperc hsize hsize u
colGPA SAT hsperc
hsize hs
| | | | |
u | | | |
| | | |
u | |
| |
= + + + + +
=
+ + + + +
= + +
+ +





( )
2 2
5 ize u +
Multiple Regression Analysis: Inference

We can do that by first transforming the X variables:

. gener at e sat 0=sat - 1200

. gener at e hsper c0=hsper c- 30

. gener at e hsi ze0=hsi ze- 5

. gener at e hsi zesq0=hsi zesq- 25

Multiple Regression Analysis: Inference

. r egr ess col gpa sat 0 hsper c0 hsi ze0 hsi zesq0

Sour ce | SS df MS Number of obs = 4137
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 4, 4132) = 398. 02
Model | 499. 030503 4 124. 757626 Pr ob > F = 0. 0000
Resi dual | 1295. 16517 4132 . 313447524 R- squar ed = 0. 2781
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- squar ed = 0. 2774
Tot al | 1794. 19567 4136 . 433799728 Root MSE = . 55986

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
col gpa | Coef . St d. Er r . t P>| t | [ 95%Conf . I nt er val ]
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sat 0 | . 0014925 . 0000652 22. 89 0. 000 . 0013646 . 0016204
hsper c0 | - . 0138558 . 000561 - 24. 70 0. 000 - . 0149557 - . 0127559
hsi ze0 | - . 0608815 . 0165012 - 3. 69 0. 000 - . 0932328 - . 0285302
hsi zesq0 | . 0054603 . 0022698 2. 41 0. 016 . 0010102 . 0099104
_cons | 2. 700075 . 0198778 135. 83 0. 000 2. 661104 2. 739047
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The results tell us that the expected GPA of a student with these characteristics is 2.70.
The 95% confidence interval for the prediction is [2.66, 2.73].

Multiple Regression Analysis: Inference

Note that the confidence interval for the average GPA of a student with these
characteristics is not the same as the confidence interval for a prediction of the actual
GPA of an individual student with these characteristics.
This is because the unobservables, u, contribute to the variance in the individual
students GPA.





Multiple Regression Analysis: Inference

In particular, consider the variance of the GPA of student i:







We just found a confidence interval for the fitted value of that students college GPA,
not her GPA itself.
As we collect more and more data, the sampling variance of the fitted value will
decline because we can estimate it more accurately but the variance of the
unobservables doesnt change. So there is a limit to how accurately we can predict an
individuals GPA, no matter how much we know about other peoples GPAs.

Multiple Regression Analysis: Inference

To find the standard error of the forecast of an individual students GPA, we write








The standard error of u from the regression output is the estimated standard error of
the fitted value of college GPA. The estimated square root of the variance of u is
given in the regression output as Root MSE (root mean squared error).

Multiple Regression Analysis: Inference

Therefore, the standard error of the forecast of an individual students college GPA
with 1200 on the SAT, in the 30
th
percentile of her high school class, from a high
school of 500 students is:







( ) ( ) ( )


i i i
i i i
colGPA colGPA u
Var colGPA Var colGPA Var u
= +
= +
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
2
2



i i i
i i i
i i i
Var colGPA Var colGPA Var u
Se colGPA Var colGPA Var u
Se colGPA Se colGPA Var u
= +
= +
= +
( ) ( ) ( )
( ) ( ) ( )
( )
2
2
2 2

0.0198778 0.55986
0.5602

i i i
i
i
Se colGPA Se colGPA Var u
Se colGPA
Se colGPA
= +
= +
=
Multiple Regression Analysis: Inference

How can we construct a confidence interval for the forecast?
Since there are many observations (4000+), the sampling distribution is very close to
normal, so we can use 1.96 as the 95% critical value. (This distribution is also t with
n-k-1 degrees of freedom)

Multiple Regression Analysis: Inference

The 95% confidence interval for the forecast is







This interval will contain a particular students GPA 95% of the time, if she has the
indicated characteristics.
This is very different from the confidence interval for the prediction, which contains
the expected value 95% of the time.

Multiple Regression Analysis: Inference
This means that, though the factors in the regression were important, we cannot use
them to accurately pin down an individuals GPA.
There are many factors other than SAT scores and high school performance that
determine it as well (and these are included in u).

Multiple Regression Analysis: Inference

Now we will discuss testing joint hypotheses, that is, the hypothesis that two or more
facts are both true.
One common use of joint hypothesis testing is to test a set of exclusion restrictions
the hypothesis that a group of variables do not affect the dependent variable once the
other variables have been included in the model.





( ) ( )
| |
| |

,
2.70 1.960.560,2.70 1.960.560
1.60,3.80
j j j j
c Se c Se | | | |
(
+

+


Multiple Regression Analysis: Inference

For example, if we have the linear model




and we want to test the hypothesis that neither x
2
nor x
3
have an effect on y once x
1

has been included in the model, we are testing the joint hypothesis that




Multiple Regression Analysis: Inference

It is not appropriate to do two t tests on single hypotheses that |
1
and |
2
are equal to
zero and assume that if you fail to reject both hypotheses separately, you fail to reject
the joint hypothesis.
Particularly if x
2
and x
3
are highly correlated, they may be jointly significant but not
individually so.

Multiple Regression Analysis: Inference

For example, when analyzing the relationship between ones extramarital affairs, ones
age, and ones duration of marriage, we knew that older people (who were married for
longer) had more affairs.
But we did not know if it was because they were old or if it was because they were
married for a long time.
In this case, we might not have evidence to say that either variable (age or duration of
marriage) has an effect on the number of affairs.

Multiple Regression Analysis: Inference

Therefore, age and duration may not be individually significant.
But for sure one of the variables (age or duration of marriage) does.
Therefore, they are likely jointly significant.
In other words, we cant reject the hypotheses that either age is not related to the
number of affairs OR duration of marriage is not related to the number of affairs.
But for sure both of these are not true. Therefore we might reject the joint hypothesis,
but not the individual hypotheses.

0 1 1 2 2 3 3
y x x x u | | | | = + + + +
0 2 3
: 0, 0 H | | = =
Multiple Regression Analysis: Inference

The idea behind testing a set of exclusion restrictions is to run two regressions one
with the variables we want to test and one without.
The next step is to examine the sums of squared residuals of the two regressions.
If the sums of squared residuals decline by a very small amount when the extra
variables are included, they may be unnecessary, and we fail to reject the hypothesis
that they are jointly equal to zero.
If they decline by a large amount, they are important and we reject the hypothesis that
they have no effect on the dependent variable.

Break

Multiple Regression Analysis: Inference

Last time we discussed testing hypotheses that involve more than one parameter.
The method is to define a third parameter that is zero then the hypothesis you want to
test is true. For example, this is how you would pick a parameter to test |
i
=|
j
.




Then manipulate the equation so you can estimate u directly, and test the hypothesis
that that third parameter is equal to zero.

Multiple Regression Analysis: Inference

For example, consider the model




and we want to test the hypothesis that |
1
=|
2
.
Substitute u=|
1
-|
2
for |
1
(|
1
=u +|
2
):






i j
u | | =
0 1 1 2 2
y x x u | | | = + + +
( )
0 2 1 2 2
y x x u | u | | = + + + +
Multiple Regression Analysis: Inference

Now collect terms:



So what we can do is define a third variable z=x
1
+x
2
, and run this regression:



If u, the coefficient on x
1
, is not significantly different from zero, then |
1
and |
2
are
not significantly different from each other.

Multiple Regression Analysis: Inference

We then started to talk about how to test joint hypotheses.
The idea behind testing a set of exclusion restrictions is to run two regressions one
with the variables we want to test and one without.
The next step is to examine the sums of squared residuals of the two regressions.
If the sums of squared residuals decline by a very small amount when the extra
variables are included, they may be unnecessary, and we fail to reject the hypothesis
that they are jointly equal to zero.

Multiple Regression Analysis: Inference

If they decline by a large amount, they are important and we reject the hypothesis that
they have no effect on the dependent variable.
For example, suppose that we are examining the salaries of professional baseball
players.
We want to find out whether players performance statistics career batting average,
number of home runs, etc. are related to salaries.

Multiple Regression Analysis: Inference
The specific model we have in mind is:



The hypothesis we are interested in testing is the joint hypothesis that none of the
performance statistics (bavg, hrunsyr, rbisyr) have any effect on salaries:



( )
0 1 2 1 2
y x x x u | u | = + + + +
0 1 2
y x z u | u | = + + +
( )
0 1 2 3 4 5
ln salary years gamesyr bavg hrunsyr rbisyr u | | | | | | = + + + + + +
0 3 4 5
: 0, 0, 0 H | | | = = =
Multiple Regression Analysis: Inference
Cont ai ns dat a f r omD: \ Economet r i cs\ St at af i l es\ MLB1. DTA
obs: 353
var s: 47 16 Sep 1996 15: 53
si ze: 46, 949 ( 95. 0%of memor y f r ee)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
st or age di spl ay val ue
var i abl e name t ype f or mat l abel var i abl e l abel
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sal ar y f l oat %9. 0g 1993 season sal ar y
year s byt e %9. 0g year s i n maj or l eagues
bavg f l oat %9. 0g car eer bat t i ng aver age
gamesyr f l oat %9. 0g games per year i n l eague
hr unsyr f l oat %9. 0g home r uns per year
r bi syr f l oat %9. 0g r bi s per year
l sal ar y f l oat %9. 0g l og( sal ar y)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Sor t ed by:

Multiple Regression Analysis: Inference
Lets first look at the regression model without the performance statistics.

. r egr ess l sal ar y year s gamesyr

Sour ce | SS df MS Number of obs = 353
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 2, 350) = 259. 32
Model | 293. 864058 2 146. 932029 Pr ob > F = 0. 0000
Resi dual | 198. 311477 350 . 566604221 R- squar ed = 0. 5971
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- squar ed = 0. 5948
Tot al | 492. 175535 352 1. 39822595 Root MSE = . 75273

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
l sal ar y | Coef . St d. Er r . t P>| t | [ 95%Conf . I nt er val ]
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
year s | . 071318 . 012505 5. 70 0. 000 . 0467236 . 0959124
gamesyr | . 0201745 . 0013429 15. 02 0. 000 . 0175334 . 0228156
_cons | 11. 2238 . 108312 103. 62 0. 000 11. 01078 11. 43683
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Multiple Regression Analysis: Inference
Now lets look at the model with the performance statistics:

. r egr ess l sal ar y year s gamesyr bavg hr unsyr r bi syr

Sour ce | SS df MS Number of obs = 353
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 5, 347) = 117. 06
Model | 308. 989208 5 61. 7978416 Pr ob > F = 0. 0000
Resi dual | 183. 186327 347 . 527914487 R- squar ed = 0. 6278
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- squar ed = 0. 6224
Tot al | 492. 175535 352 1. 39822595 Root MSE = . 72658

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
l sal ar y | Coef . St d. Er r . t P>| t | [ 95%Conf . I nt er val ]
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
year s | . 0688626 . 0121145 5. 68 0. 000 . 0450355 . 0926898
gamesyr | . 0125521 . 0026468 4. 74 0. 000 . 0073464 . 0177578
bavg | . 0009786 . 0011035 0. 89 0. 376 - . 0011918 . 003149
hr unsyr | . 0144295 . 016057 0. 90 0. 369 - . 0171518 . 0460107
r bi syr | . 0107657 . 007175 1. 50 0. 134 - . 0033462 . 0248776
_cons | 11. 19242 . 2888229 38. 75 0. 000 10. 62435 11. 76048
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Multiple Regression Analysis: Inference

Notice that the residual sum of squares diminished from 198.3 to 183.2.
This means that the model is fitting better, since the residuals are smaller this means
that there is less distance between the actual and the fitted values.
The question is, is this improvement enough to justify including the three variables in
the model, or could it just be random noise?

Multiple Regression Analysis: Inference

In order to answer this question, we need to construct an F statistic:





Where SSR
ur
is the sum of squared residuals for the unrestricted model, SSR
r
is the
sum of squared residuals for the restricted model, n is the number of observations, k is
the number of variables in the unrestricted model, and q is the number of restrictions
(the number of variables excluded in the restricted model).

Multiple Regression Analysis: Inference

Under the null hypothesis, the F statistic will have an F distribution (the F
distribution is constructed especially for doing hypothesis testing of this type) with
(q,n-k-1) degrees of freedom.
Note that the F distribution has TWO parameters q and n-k-1 not just one, like the t
distribution. You need both to do an F test.

Multiple Regression Analysis: Inference

In this case, the F statistic is equal to





The F statistic is always positive.
When the F statistic exceeds the critical value for an F test with (q,n-k-1) degrees of
freedom, you reject the null hypothesis.

( )
( )
/
/ 1
r ur
ur
SSR SSR q
F
SSR n k


( )
( )
( )
( )
/ 198.311 183.186 /3
9.55
/ 1 183.186/ 353 5 1
r ur
ur
SSR SSR q
F
SSR n k

= =

Multiple Regression Analysis: Inference

In this case, looking at Table G.3b in the textbook tells us that the critical value for an
F distribution with (3,120) degrees of freedom is 2.68.
We actually want (3,347) degrees of freedom but this is close.
Since 9.55 is much greater than 2.68, we clearly reject the hypothesis that none of the
performance statistics influence baseball players salary.

Multiple Regression Analysis: Inference

Stata can do F tests:

. t est bavg hr unsyr r bi syr

( 1) bavg = 0. 0
( 2) hr unsyr = 0. 0
( 3) r bi syr = 0. 0

F( 3, 347) = 9. 55
Pr ob > F = 0. 0000

Multiple Regression Analysis: Inference

Why are the variables jointly significant but not individually? It turns out they are
highly correlated:

. cor r bavg hr unsyr r bi syr
( obs=353)

| bavg hr unsyr r bi syr
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - -
bavg | 1. 0000
hr unsyr | 0. 1906 1. 0000
r bi syr | 0. 3291 0. 8907 1. 0000

So we know that the best players get paid more, but if this is because of home runs,
batting average or whatever we cant tell.





Multiple Regression Analysis: Inference

It is also possible to test a set of general linear restrictions, as opposed to merely
exclusion restrictions (hypotheses that some of the | parameters are equal to zero).
As an example we will look at the rationality of housing price assessments.
If assessors of housing prices are doing their job correctly, then their assessment
should include the value of everything you can observe on the house; for example the
number of bedrooms and the number of square feet.

Multiple Regression Analysis

In a regression context, this means that if you are predicting the sale price of a house
based on its assessed value, you shouldnt need variables for the number of bedrooms
or the size of the property.
If you do, it means the price assessment hasnt taken these variables into account
properly.

Multiple Regression Analysis: Inference
Specifically, the model that we have in mind is




This is the unrestricted model.
The hypothesis we want to test is:



Multiple Regression Analysis: Inference

We can construct the restricted model by substituting the restrictions
(|
1
=1;|
2
=0;|
3
=0;|
4
=0) into the unrestricted model.
The restricted model is then







( ) ( ) ( ) ( )
0 1 2 3 4
ln ln ln ln price assess lotsize sqrft bdrms u | | | | | = + + + + +
0 1 2 3 4
: 1, 0, 0, 0 H | | | | = = = =
( ) ( )
( ) ( )
0
0
ln ln
ln ln
price assess u
price assess u
|
|
= + +
= +
Multiple Regression Analysis: Inference

. d

Cont ai ns dat a f r omhpr i ce1. dt a
obs: 88
var s: 10 17 Mar 2002 12: 21
si ze: 3, 168 ( 99. 5%of memor y f r ee)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
st or age di spl ay val ue
var i abl e name t ype f or mat l abel var i abl e l abel
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
pr i ce f l oat %9. 0g house pr i ce, $1000s
assess f l oat %9. 0g assessed val ue, $1000s
bdr ms byt e %9. 0g number of bdr ms
l ot si ze f l oat %9. 0g si ze of l ot i n squar e f eet
sqr f t i nt %9. 0g si ze of house i n squar e f eet
col oni al byt e %9. 0g =1 i f home i s col oni al st yl e
l pr i ce f l oat %9. 0g l og( pr i ce)
l assess f l oat %9. 0g l og( assess
l l ot si ze f l oat %9. 0g l og( l ot si ze)
l sqr f t f l oat %9. 0g l og( sqr f t )
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Sor t ed by:

Multiple Regression Analysis: Inference

The unrestricted model is:

. r egr ess l pr i ce l assess l l ot si ze l sqr f t bdr ms

Sour ce | SS df MS Number of obs = 88
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 4, 83) = 70. 58
Model | 6. 19607473 4 1. 54901868 Pr ob > F = 0. 0000
Resi dual | 1. 82152879 83 . 02194613 R- squar ed = 0. 7728
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- squar ed = 0. 7619
Tot al | 8. 01760352 87 . 092156362 Root MSE = . 14814

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
l pr i ce | Coef . St d. Er r . t P>| t | [ 95%Conf . I nt er val ]
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
l assess | 1. 043065 . 151446 6. 89 0. 000 . 7418453 1. 344285
l l ot si ze | . 0074379 . 0385615 0. 19 0. 848 - . 0692593 . 0841352
l sqr f t | - . 1032384 . 1384305 - 0. 75 0. 458 - . 378571 . 1720942
bdr ms | . 0338392 . 0220983 1. 53 0. 129 - . 0101135 . 0777918
_cons | . 263743 . 5696647 0. 46 0. 645 - . 8692972 1. 396783
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -







Multiple Regression Analysis: Inference

To estimate the restricted model, we create a new independent variable equal to
ln(price)-ln(assess):

. gener at e l passess=l pr i ce- l assess

. r egr ess l passess

Sour ce | SS df MS Number of obs = 88
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 0, 87) = 0. 00
Model | 0. 00 0 . Pr ob > F = .
Resi dual | 1. 88014885 87 . 021610906 R- squar ed = 0. 0000
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- squar ed = 0. 0000
Tot al | 1. 88014885 87 . 021610906 Root MSE = . 14701

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
l passess | Coef . St d. Er r . t P>| t | [ 95%Conf . I nt er val ]
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
_cons | - . 0848135 . 0156709 - 5. 41 0. 000 - . 1159612 - . 0536658
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -


Multiple Regression Analysis: Inference

In this case, q (the number of restrictions) is 4, n (the number of observations) is 88,
and k (the number of variables in the unrestricted model) is 4. The F statistic for the
hypothesis test is:




From table G.3b, the critical value for an F distribution with (4,90) degrees of freedom
is 2.47. 0.667 is clearly less than that; therefore we fail to reject the hypothesis that
assessed prices accurately take into account observable characteristics of the property
(|
1
=1;|
2
=0;|
3
=0;|
4
=0).

Multiple Regression Analysis: Inference

Another form of the F statistic that is sometimes seen is the R squared form.
This derivation requires the following facts, from the definition of R squared:









( )
( )
( )
( )
/ 1.8801 1.8215 / 4
0.667
/ 1 1.8215/ 88 4 1
r ur
ur
SSR SSR q
F
SSR n k

= =

( )
2
2
2
1
SSE SST SSR
R
SST SST
SSR R SST SST
SSR SST R

= =
=
=

Multiple Regression Analysis: Inference


















Multiple Regression Analysis: Inference

Note: this form of the F statistic is ONLY valid when you are testing exclusion
restrictions (hypotheses that | parameters are jointly equal to zero).
If you are testing other types of restrictions like the example we just did you must
use the SSR form.

Multiple Regression Analysis: Inference
A general strategy for testing multiple linear hypotheses is:
First, estimate the model, and find the residual sum of squares.
Next, assume the hypotheses are true, and use these assumptions to find an
equation for the restricted model. For example, assuming a coefficient is equal to
zero means the regression model will no longer include that variable.

Multiple Regression Analysis: Inference

Next, estimate the restricted model, and find the residual sum of squares for that
model.
Use the residual sums of squares from the restricted model and the unrestricted
model, along with q (the number of restrictions), n (the number of observations),
and k (the number of variables in the unrestricted model) to calculate an F statistic
for the test.


( )
( )
( ) ( ) ( )
( )
( )
( ) ( ) ( )
( )
( )
( )
( ) ( )
2 2
2
2 2
2
2 2
2
/
/ 1
1 1 /
1 / 1
1 1 /
1 / 1
/
1 / 1
r ur
ur
r ur
ur
r ur
ur
ur r
ur
SSR SSR q
F
SSR n k
SST R SST R q
F
SST R n k
R R q
F
R n k
R R q
F
R n k



=


=

=

Multiple Regression Analysis: Inference

Find the critical value for the F test by looking at the appropriate table (G.3 in your
textbook) for an F distribution with (q,n-k-1) degrees of freedom.
You must choose the significance level G.3a is 10%, G.3b is 5%, and G.3c is 1%.
On this table, q is the numerator degrees of freedom, and n-k-1 is the
denominator degrees of freedom.
If the F statistic exceeds the critical value, you reject the hypothesis. Otherwise
you fail to reject the hypothesis.

You might also like