Econometrics WS11-12 Course Manual

Course manual Econometrics: Week 1
About the course manual
At the beginning of every week we will upload a course manual for the corresponding
week. We decided to create a course manual for this course to have one source where
all information concerning both the content and the organization of the course is made
available for the student. Important information that is contained in the lecture slides
will be repeated in this manual. It is therefore obligatory for all student taking this course
to read the manual for every week of the semester. E-mails containing questions that
could be answered by reading the manual will not be replied. The most important part
of the manual is this weeks, which contains background information about the course,
some tips at how to study for the course, as well as important dates.
Besides some organizational information the course manual will contain detailed reading
instructions for every week, including both obligatory and voluntary literature. Furthermore, obligatory and additional exercises for each week will be provided. Finally, starting
from week 2 the manual will contain the solutions to the obligatory exercises.
People
The course is taught by Jun.-Prof. Dr. Hans Manner, who gives both the lectures and the
exercise meetings. He has studied econometrics at the University of Maastricht from 20012005, followed by a PhD at the same university from 2006-2010. Since April 2010 he works
at the Department for Social- and Economic Statistics, being attached to the chair of Prof.
Dr. Karl Mosler. He can be reached by e-mail (manner@statistik.uni-koeln.de), by
phone (0221-4704130) or by coming to his oce located at Meister-Ekkehart-Str. 9,
second oor during his oce hours to be found on his webpage.
Walter Orth assists in designing and preparing the course, and may occasionally take over
a class in case of emergency. In particular, he is responsible for preparing the exercises
and their solutions. Nevertheless, he can also be contacted in case students have questions
regarding the content of the course. His e-mail address is orth@statistik.uni-koeln.de
and he can be sought during his oce hours to be found on his webpage. Currently
Walter Orth is a PhD Student of Prof. Mosler. Before that he studied mathematics and
economics in Duisburg-Essen.
General information about the course
Econometrics is an extremely important topic for all economics students and for many
business students as well. Sucient knowledge of econometrics often is a prerequisite, or
at least extremely useful, for taking many economics and nance courses. Additionally,
writing a Masters thesis often requires performing an empirical analysis using econometric techniques. The aim of this course is to prepare the students for such tasks and to
teach the most relevant econometric techniques needed to analyze economic data.
While being very useful, most students nd econometrics to be a rather dicult subject.
First of all, it requires knowledge and understanding of many topics from mathematics
and statistics. Although all students have learned these things, only few actually remember the relevant things when needed. Econometrics itself requires understanding often
complex theory and knowing when certain things need to be applied. There are many
subtle issues of when and how dierent methods may or may not be applied. Therefore,
mastering econometrics requires hard work, time, exercise and reecting on what has
been learned. We tried to design the course in a way that nds a good balance between
lectures/exercise sessions and self study. We hope that following our recommendations
on how to study for this course, students will be able to acquire sucient knowledge
of econometrics not only to pass the exam without diculty, but also to apply basic
econometric techniques in practice.
Students should nevertheless be aware that, for a masters course in econometrics, the
material covered will not be very advanced. Students who have good prior knowledge, for
example those who have taken the Prolgruppe Quantitative Methoden der Wirtschaftsund Sozialwissenschaften in Cologne, are recommended to skip this course and move
right away into an advanced econometrics course like Time series analysis or Advanced
Econometrics. Those who do take the course, nd it interesting, and want to learn
more about econometrics are recommended to take one of the follow up courses oered
by our department. There are many interesting and important topics in econometrics
that cannot be covered in a one-semester course and it denitely pays o to have this
knowledge.
Prior knowledge
As mentioned above, prior knowledge in mathematics and statistics is absolutely crucial.

All students who have done their bachelor studies in Cologne should be ne. In general,
having heard two statistics courses, which include an introduction to the linear regression
model is expected. We recommend that you review your old textbook or notes whenever
you nd it necessary. The most important concepts will again be covered in this course,
but this will be very brief and without much explanation. Having taken an additional
course in econometrics at the bachelor level would be very helpful, but not necessary.
Students who have studied econometrics before should decide for themselves if they want
to take this course, implying they will hear many things they have learned before, or if
right away they want to take a more advanced course. This depends on how intensive
their former course was and how well they performed. If you have any doubt on this issue
2
feel free to contact the professor.
The textbook and reading
The main textbook for the course is A guide to modern econometrics by Marno Verbeek.
The course is based on the third edition of the book, but if you have a dierent edition
there should be no problem. Each week we will give you detailed reading and we will
follow the structure of this book quite closely. This means that the lecture slides and the
corresponding book chapters are close to each other in terms of structure, content and
notation. This is intended, as we believe that this will facilitate combining self study and
lectures. Reading the book is obligatory and you are required to know everything from
the weekly reading. We highly recommend you to buy this book, as it will not only be
extremely useful for studying this course, but will also serve as a reference in the future.
If you are not willing or able to buy this book there are currently 16 copies available at
the Lehrbuchsammlung, as well as further copies at other libraries of the faculty.
There are other textbooks that you may want to consider when studying for this course.
If you have studied econometrics in the past the book you have used back then should
be useful, also given that it is familiar to you. We recommend three additional books for
which we will give some broad reading indication for each week. The books Introductory Econometrics by Jerey Wooldridge and Introduction to Econometrics by James
Stock and Mark Watson are a little bit easier than the Verbeek. Econometric Analysis
by William Greene is much more advanced, but also contains more details and many
additional topics.
Those who prefer to consult a German textbook we suggest Einfhrung in die konometrie by Walter Assenmacher or konometrie. Eine Einfhrung by Ludwig von Auer.
gretl
Econometrics requires using a computer program to analyze data. Many such programs
exist and each programm has its pros and cons. We have decided to use gretl, because
it allows to use a wide range of econometric techniques while at the same time being
freeware. Many other programs such as EVIEWS, STATA or Matlab may be more
suitable for certain specialized task, but they are expensive and students cannot use
them at home. Given that this course is too large for going to the computer lab this
would be very problematic.
gretl can be downloaded for free from http://gretl.sourceforge.net/. There you also
nd a lot of additional information about the program and a lot of add ons. In particular,
you can directly include the complete data used in the Verbeek book, but also data from
other books, which makes doing the empirical exercises much easier. An introduction to
gretl will be given during the second exercise meeting.
Studying for the course
From the above it should be rather clear how the ideal student should study for this
course. You should read the obligatory literature either before or after the corresponding
lecture. If, after both lecture and reading, a topic is not understood you may want to look
up the topic in one of the other books. If you nd the course dicult and have problems
understanding everything you may decide to read one of the other books continuously.
The exercises should at least be read and thought about before the second class of the
week, but ideally student should have tried to solve them by themselves. Just looking at
the professor presenting the solution may seem ne to you, but experience has shown that
for most students this does not suce. After the exercise meeting it is recommended to
review the solutions of those exercises that appeared dicult. Note that solutions to all
obligatory exercises will be provided after the meetings. Again, those who have problems
with the material should try the additional exercise and try to answer the review questions
(for which we will in general not provide the answers).
The exam
There will be two exams oered this semester. The rst will take place on 31.01.2012 at
14.00h in Aula 1. This is during the last week of the semester and may be at the same
time as other exams or classes. Therefore the second exam date will be on 20.03.2012 at
10.00h in HS C. The exams will last 60 minutes and you are allowed to bring a pocket
calculator and one A4 piece of paper with any notes you desire. However, you will not be
given a formula sheet, so all formulas you think you will need must be included in those
notes. There are hardly any past exams available to help you study, since this course is
relatively new and is taught by Jun.-Prof. Manner for the rst time. However, during
the lectures you can expect to get some hints at what may be useful for the exam. The
last lecture of the semester will be spent reviewing the most relevant material and should
again serve as a preparation for the exam. For now it can be said that the main focus
will be on understanding the material of the course rather than exclusively being able to
perform mechanical calculations (although that may also be a part). The exam questions
will certainly require students to interpret computer output and relate it to theoretical
concepts.
This weeks reading
Obligatory reading: Appendix A and B, as well as Chapter 1 of Verbeek

Additional reading: The appendices of Wooldridge are very detailed about the mathematical and statistical basics. They also contain many examples. Reading it all is probably
a bit too much, but you may want to read the sections corresponding to topics that you
nd dicult.
Chapter 1 of Wooldridge gives you a nice introduction on what econometrics is about.
Appendices A and B of Greene are very detailed and contain much more information than
we need. You may want to consult them if you are looking for more concise secondary
reading.
10
Exercises (obligatory)
1) Calculate the following sums and products (as far as they are dened):
]
][
3 1
A=
2 0
6
7 3
2 1 4
E = 4 2
3 7
B =6+
1 2
[ ]
]
1 0 [
1 2
C=
1 2
[
]
3 [
1 9
7
D=
][
3 5
9 7
3
9
] [
6 3 3
F =
4 1 2
1 8 4
7 5
2
2) Let A and B be n n matrices with full rank. Calculate

a) (AB) (B 1 A1 )
b) (A(A1 + B 1 )B)(B + A)1
3) Consider the matrix
X = x1 x 2 x n ,
where x1 , x2 , . . . , xn have K entries each. Show that
n
xi xi = X X .
i=1
4)
a) Let X1 , X2 , Y1 , Y2 be random variables. Show that
cov{X1 + X2 , Y1 + Y2 } = cov{X1 , Y1 } + cov{X1 , Y2 } + cov{X2 , Y1 } + cov{X2 , Y2 }
5
b) Now, X is a column vector of random variables with m entries. Note that the
covariance matrix of X is dened as V {X} = E{(X E{X})(X E{X}) } =
E{XX } E{X}E{X} . Further, let A be a n m matrix of constants. Show that
V {AX} = AV {X}A
11
Additional exercises
1) Show that
(AB)1 = B 1 A1
for any two n n matrices A and B that have full rank (i.e. rank(A) = rank(B) = n).
2) Show that
cov{aX1 , bX2 } = ab cov{X1 , X2 }
where a and b are constant (non-random) scalars and X1 and X2 are random variables.
3) Show that E{(X E{X})(X E{X}) } = E{XX } E{X}E{X} .
If you still feel unfamiliar with matrix algebra you may want to work through exercises
2, 3 and 4 in chapter 2 of the textbook Mosler/Dyckerho/Scheicher: Mathematische
Methoden fuer Oekonomen.
1 General information
This week is concerned with introducing the linear regression model along with model
assumptions, its estimation by ordinary least squares and the properties of the OLS
estimator both in finite samples and asymptotically.
2 Reading
Obligatory reading: Verbeek, Sections 2.1, 2.2, 2.3 and 2.6.
Additional reading: Wooldridge, Chapters 2,3 and 5. The material that we cover is
distributed over these Chapters in Wooldridge and many topics we cover in the coming
weeks are treated in between. Therefore you have to look for the relevant sections before
or read these chapters completely after we have treated all the subjects.
Greene, Chapter 2, Chapter 3.1 and 3.2, Chapter 4.1-4.6 and 4.9.
3 Exercises
During this weeks exercise meeting we will provide an introduction to gretl. You may
print and read the document A short introduction to gretl, the content of which will be
explained in the meeting. If you cant make it to the meeting make sure to go through
all the steps in the manual and explore some of the features of gretl by clicking through
some of the menus.
Additionally, if time permits we will give you a short introduction to alternative programs
that can be used for applying econometric techniques. Examples of such programs are
EVIEWS, STATA, Matlab and R.
4 Additional exercises
1)
Look at the True/False questions available at http://www.econ.kuleuven.be/gme/. Try
to answer 2.1, 2.3, 2.5 and 2.7.
1
2)
Consider the simple linear regression model
yi = 1 + 2 xi + i ,
i = 1, . . . , N ,
Assume that it holds that i N (0, 2 ) for i = 1, . . . , N , where 1 , . . . , n are independent.

You observe the following values for xi and yi :
xi
yi
5
5
10 0
15
5 5 12.5
5
17.5
a) Calculate estimates for 1 and 2 using the Ordinary Least Squares method.
b) Estimate 2 . (Note: The estimated square root of 2 is usually called the standard
error of the regression.)
c) Write down the estimated regression along with the estimated standard errors of
1 and 2 .
5 Solution to last weeks exercises

1)
A =
D =
E =
F =

16 20 5
6
7 3
3 1
=
12 14 6
2 0 2 1 4

3 27
3
1 9 =
7 63
7

3

3 5 3
4 2
= 144
= 30 6
9
9 7 9

6 4
1 8 4
6 3 3 1 8 4
= 3 1
7 5
2
7 5
2
4 1 2
3 2
34 28 16
10 19 10
11 34
16
The matrices B and C are not defined.

2)
1
1
a) (AB) (B 1 A1 ) = B A (A1 ) (B 1 ) = B A
| A
{z } B
=I
1
= |B B
{z } = I
=I
1
1
1
1
b) (A(A1 + B 1 )B)(B + A)1 = (( AA
| {z } + AB )B)(B + A) = (B + A |B {zB} )(B +
=I
=I
A)1 = (B + A)(B + A)1 = I
3)

X = x1 x2 xn

x1i
x2i

, i = 1, . . . , n
xi =

xKi

X X = x1 x2 xn x1 x2 xn

x1
x2

= x1 x2 xn
xn
=
x1 x1
|{z}
+x2 x2 + . . . + xn xn
K K matrix
n
X
=
xi xi
i=1
4)
a)
cov{X1 + X2 , Y1 + Y2 } =E{((X1 + X2 ) E{X1 + X2 })((Y1 + Y2 ) E{Y1 + Y2 })}
=E{(X1 E{X1 } + X2 E{X2 })(Y1 E{Y1 } + Y2 E{Y2 })}
=E{(X1 E{X1 })(Y1 E{Y1 }) + (X1 E{X1 })(Y2 E{Y2 })+
(X2 E{X2 })(Y1 E{Y1 }) + (X2 E{X2 })(Y2 E{Y2 })}
=E{(X1 E{X1 })(Y1 E{Y1 })} + E{(X1 E{X1 })(Y2 E{Y2 })}+
E{(X2 E{X2 })(Y1 E{Y1 })} + E{(X2 E{X2 })(Y2 E{Y2 })}
=cov{X1 , Y1 } + cov{X1 , Y2 } + cov{X2 , Y1 } + cov{X2 , Y2 }
b)
V {AX} = E{(AX E{AX})(AX E{AX}) }
= E{(AX AE{X})(AX AE{X}) }
= E{(A(X E{X}))(A(X E{X})) }
= E{A(X E{X})(X E{X}) A }
= AE{(X E{X})(X E{X}) }A
= AV {X}A
6 Solution to last weeks additional exercises

1)
We only have to show that
(AB)(B 1 A1 ) = I .
Indeed, it holds that

1
1 1
(AB)(B 1 A1 ) = A |BB
{z } A = AA = I .
=I
2)
cov{aX1 , bX2 } =E{aX1 bX2 } E{aX1 }E{bX2 }

=abE{X1 X2 } abE{X1 }E{X2 }
=ab(E{X1 X2 } E{X1 }E{X2 })
=ab cov{X1 , X2 }
3)
E{(X E{X})(X E{X}) }

= E{XX XE{X} E{X}X + E{X}E{X} }
= E{XX } E{XE{X} } E{E{X}X } + E{E{X}E{X} }
= E{XX } E{X}E{X} E{X}E{X } + E{X}E{X}
= E{XX } E{X}E{X}
In this week we mainly treat the problem of hypothesis testing in the linear regression
model. Problems related to multicollinearity and how to detect it are treated as well.
Finally, we look at how to make predictions with the linear regression model.
2 Reading
Obligatory reading: Verbeek, Sections 2.4, 2.5, 2.7, 2.8 and 2.9
Additional reading: Wooldridge, Chapter 4, Section 6.4. The R2 treated on pages 80-81
and multicollinearity on page 95.
Greene, Section 3.5 and Chapter 5. Section 5.5 can be skipped.
3 Exercises
1)
Suppose you estimate a parameter vector by some estimator b and that your estimator
has the following property:
N (b ) N (0, A) ,
where A is some matrix. Assume further that there is an estimator A which consistently
estimates A. Now, consider the following empirical results:

400 100
2
, A=
N = 10000 , b =
100 900
3
Calculate asymptotic standard errors and t-statistics for b1 and b2 .
2)
Consider the following multiple linear regression model:
yi = 1 + 2 xi2 + 3 xi3 + i
i = 1, . . . , N .
a) Explain how one can test the hypothesis that 3 = 1 .

1
b) Explain how one can test the hypothesis that 2 + 3 = 0 . As one alternative,
consider to rewrite the model in a way that allows applying the standard t-test.
c) Explain how one can test the hypothesis that 2 = 3 = 0 .
d) Explain how one can test the hypothesis that 2 = 0 and 3 = 1.
3)
Load the data set dataweek3ex3.gdt into GRETL.
a) Assume you believe that there exists a linear relationship between y and x2, x3,
x4, and x5. Estimate a linear regression and interpret the output. What are the
most striking findings? What is the most likely explanation for your findings?
b) Use the appropriate tools from the lecture to look for evidence of multicollinearity
in your data.
4)
Load the data set hprice1.gdt from the GRETL introduction and consider again OLS
estimation of the linear regression model
log(price)i = 1 + 2 log(sqrf t)i + 3 log(lotsize)i + 4 bedroomsi + i .
a) Test the hypothesis that 4 = 0.
b) Test the hypothesis that 2 = 1.
c) Test the hypothesis that 2 + 4 = 0.
d) Test the joint hypothesis that 4 = 0 and 2 = 1.
4 Additional exercises
1)
Show that in the linear model
N (0, 2 I)
y = X + ,
the Wald test for the general linear hypothesis H0 : R = q is asymptotically equivalent
with the F test.
2)
Look again at the True/False questions available at http://www.econ.kuleuven.be/gme/.
Try to answer 2.2, 2.4, 2.6 and 2.8.
5 Solution to last weeks additional exercises

2)
a) The matrix containing the regressors is
1 5
1 10
X=
1 0 .
1 15
1 5
Calculating the OLS estimator (X X)1 X y yields b1 = 5 and b2 = 1. (Note that

b
the parameter vector b is also sometimes denoted by .)
b) For the standard error of regression we have to calculate the residuals

ei = yi yi = yi (b1 + b2 xi ) ,
i = 1, . . . , N .
In our case we have

yi
ei
0 5
5
10
5 10 10 2.5
10
7.5
Thus, the estimator for 2 is

s2 =
N
X
1
e2 = 95.8333 ,
N 2 i=1 i
so that the standard error of regression is given by
s = 95.8333 = 9.7895 .
c) The standard errors of b1 and b2 are the square roots of the values on the main
diagonal of Vb {b} = s2 (X X)1 . Here we have se(b1 ) = 5.3619 and se(b2 ) = 0.6191,
so that the estimated regression can be written down as
y=
(5.3619)
(0.6191)
x.
General information
This week is concerned with various rather practical, but extremely important aspects
concerning the use of the linear regression model. We discuss how to interpret the estimated parameters of the model in dierent situations. Furthermore, we study how to
select the set of regressors and how to test the functional form.
Reading
Obligatory reading: Verbeek, read the entire Chapter 3 with the exception of Section
3.2.3
Additional reading: Wooldridge, the relevant material cannot be found in one single place
in this book, but is dispersed over Chapters 2, 3 and 6. As mentioned before, it makes
sense to read the rst 6 Chapters of Woolridge to cover the material of about the rst 4
weeks of this course.
Greene, The relevant parts of Chapters 6 and 7.
Exercises
1)
Consider the simple regression
log(yi ) = 1 + 2 log(xi ) + i ,
i = 1, . . . , N .
(1)
a) Show that 2 can be interpreted as elasticity of yi with respect to xi .

b) Calculate the elasticity of yi with respect to xi for the alternative model
yi = 1 + 2 xi + i ,
i = 1, . . . , N .
(2)
Explain the essential dierence between the elasticity of model (1) and model (2).
c) Consider now a third model:

log(yi ) = 1 + 2 xi + i ,
i = 1, . . . , N .
(3)
Interpret the coecient 2 .

2)
a) Suppose you want to investigate the question if a beer tax will reduce trac fatalities. Further assume that you have data on the number of trac fatalities and the
beer tax rate for dierent regions of a country. Would it be sensible to include i)
the amount of beer consumption or ii) the number of miles driven as explanatory
variables in your regression?
b) Suppose you want to investigate the question if pesticide use of farmers has an
eect on the health expenditures of farmers. When regressing health expenditures
on pesticide usage amounts, does it make sense to include the number of doctor
visits as a control variable?
3) (Adapted from Stock/Watson)
Consider the results from a study comparing total compensation among top executives
in a large set of U.S. public corporations in the 1990s. Let Female be a dummy variable
that is equal to 1 for females and 0 for males.
a) A simple regression of the logarithm of earnings on Female yields
\
log(Earnings)
= 6.48 0.44 F emale
(0.01)
(0.05)
Interpret the coecient of Female.

b) Two new variables, the market value of the rm (a measure of rm size, in millions
of dollars) and stock return (a measure of rm performance, in percentage points),
are added to the regression:
\
log(Earnings)
= 3.86 0.28 F emale + 0.37 log(M arketV alue) + 0.004Return
(0.03)
(0.04)
(0.004)
(0.003)
Interpret the coecient of log(MarketValue). Further, explain why the coecient

of Female has changed from the regression in a).
c) What would happen to your regression if the market value of rms is measured in
billions?
Load the dataset CPS04.gdt into GRETL. The data are from the Current Population
Survey of the U.S. Department of Labor.
2
a) Run a regression of the logarithm of average hourly earnings (AHE) on age (Age),
gender (Female) and education (Bachelor). If Age increases from 33 to 34, how are
earnings expected to change?
b) Run a regression of log(AHE) on log(Age), Female and Bachelor. If Age increases
from 33 to 34, how are earnings expected to change?
c) Run a regression of log(AHE) on Age, Age2 , Female and Bachelor. If Age increases
from 33 to 34, how are earnings expected to change?
d) Do you prefer the regression in b) to the regression in a)? Explain.
e) Do you prefer the regression in c) to the regression in a)? Explain.
f) Do you prefer the regression in c) to the regression in b)? Explain.
g) Plot the regression relation between Age and log(AHE) from c) for females with a
Bachelor degree.
1) (Adapted from Verbeek)

Explain why it is inappropriate to drop two explanatory variables from the model at the
same time on the basis of their t-statistics only.
Suppose you want to analyze the relationship of class size and student performance as
measured by some test score. Talking with a teacher you get the following comment:
In my experience, students do well when the class size is less than 20 students and do
poorly when the class size is greater than 25. There are no gains from reducing class size
below 20 students, the relationship is constant in the intermediate region between 20 and
25 students, and there is no loss to increasing class size when it is already greater than
25.
If the teacher is right, how should you choose the functional form of your model?
Solutions to last weeks exercises
1)
Given that V { N (b )} = V { N b} A and using that A is a consistent estimator

of A, in large samples we have
V { N b} A
1
V {b} A
N
(
)
1
400
100
V {b} =
10000 100 900
400
900
V {b1 } =
= 0.04 , V {2 } =
= 0.09
10000
10000
The standard errors are equal to the square roots of the estimated variances so that we
b1
have se(b1 ) = 0.2 and se(b2 ) = 0.3. For the t statistics we have t1 = se(b
= 10 and
1)
b2
t2 = se(b2 ) = 10.
2)
a) The hypothesis that 3 = 1 can be tested by means of a t-test.
The test statistic is
b3 1
t=
.
se(b3 )
which - under the null hypothesis - has an approximate standard normal distribution
in large samples and a t distribution with (N 3) degrees of freedom in small samples
under the assumption of normality of the error term. At the 95 % condence level,
we reject the null in large samples if |t| > 1.96 (two-tailed test).
b) The hypothesis that 2 + 3 = 0 can also be tested by means of a t-test.
The test statistic is
b2 + b3
t=
.
se(b2 + b3 )
To calculate se(b2 + b3 ) we use the estimated covariance matrix Vb {b} and the fact
that V {b2 + b3 } = V {b2 } + V {b3 } + 2 cov{b2 , b3 }.
Alternatively, you can rewrite the model as
yi = 1 + (2 + 3 )xi2 + 3 (xi3 xi2 ) + i
yi = 1 + 2 xi2 + 3 (xi3 xi2 ) + i
and apply the usual t-test for 2 .
c) The joint hypothesis that 2 = 3 = 0 can be tested by means of the overall F-test
which is a special case of the general F-test. The test statistic is
F =
R2 /2
.
(1 R2 )/(N 3)
We compare the test statistic with the critical values from an F distribution with
2 (the number of restrictions) and N -3 degrees of freedom.
4
d) The joint hypothesis that 2 = 0 and 3 = 1 can be tested by means of the general
F-test. For the general joint linear null hypothesis H0 : R = q we have in our
case
(
)
0 1 0
R=
0 0 1
and q = (0, 1) . The test statistic is
F =
(Rb q) (R(X X)1 R )1 (Rb q)

,
Js2
where one has to insert R and q, the estimated parameter vector b, the regressor
matrix X, the degrees of freedom J (here J = 2) and the estimated error variance
N 2
1
s2 = N K
i=1 ei . Note that the matrix R has nothing to do with the goodnessof-t measure R2 .
3)
a) Model 1: OLS, using observations 1100
Dependent variable: Y
const
X2
X3
X4
X5
Coecient Std. Error
t-ratio
p-value
0.0541118
0.621068
0.402881
0.497742
0.601589
0.5865
0.9125
0.6760
1.6829
6.7055
0.5589
0.3638
0.5007
0.0957
0.0000
0.0922596
0.680645
0.595978
0.295766
0.0897158
Mean dependent var 0.000323 S.D. dependent var

Sum squared resid
79.71127 S.E. of regression
R2
0.666672 Adjusted R2
F (4, 95)
47.50110 P-value(F )
Log-likelihood
130.5559 Akaike criterion
Schwarz criterion
284.1376 HannanQuinn
1.554198
0.916005
0.652637
7.09e22
271.1118
276.3836
The rst three variables are not signicant, but the R2 is quite large indicating a
good overall t. The large standard errors may partly be caused by multicollinearity.
b) First of all you compute the correlation matrix between the regressors:
Correlation coecients, using the observations 1100
5% critical value (two-tailed) = 0.1966 for n = 100
X2
X3
X4
X5
1.0000 0.8816 0.4565
0.0328
1.0000 0.0123
0.0816
1.0000 0.1114
1.0000
5
X2
X3
X4
X5
The correlation between x2 and x3 is quite large (0.88), indicating multicollinearity.

Also the correlation between x2 and x4 is notable.
Next, run the auxiliary regression of the each of the regressors on the remaining
ones and look at the resulting Rj2 . You get R22 = 0.97, R32 = 0.96, R42 = 0.89,
and R52 = 0.024. This indicates strong multicollinearity between the rst three
variables, but excludes x5.
4)
a) This test is automatically performed by GRETL. The t-statistic is 1.342 and the
associated p-value is 0.1831. Thus, we cannot reject H0 at conventional signicance
levels.
b) The t-statistic for this test is
t=
0.700232 1
= 3.228
0.0928652
Using the p-value nder of GRETL (see the GRETL introduction) gives a p-value
of 0.00177876 for the two-tailed t-test, so that H0 is clearly rejected. Note that we
have to choose 88 4 = 84 degrees of freedom.
c) Similar to exercise 2b) you have two options. First, in the model window, you may
go to Tests Linear restrictions and type b[2] + b[4] = 0. Second, rearrange the
model by dening a new variable dened as the dierence of bedroomsi log(sqrf t)i
(this can done in GRETL via Add Dene new variable varname = lsqrft bdrms) and reestimate the model with the regressors log(sqrf t)i , log(lotsize)i and
varname. The standard t-test for the coecient of log(sqrf t)i in the revised model
can then be applied. The result is a t-statistic of 8.918 which corresponds to a
p-value of 8.68e-014 leading to a very clear rejection of the null hypothesis.
d) Since we have a joint hypothesis we have to use the F-test now. Go again to
Tests Linear restrictions, and type b[2] = 1 and b[4] = 0 (see Help for
explanations). The test statistic is F = 5.25722 with a p-value of 0.00706007. Thus
- not surprisingly after the result from part b) - we reject our joint hypothesis.
Solution to last weeks additional exercises
1)
The statistic for the F-test is given by (see lecture 3)
(Rb q) (R(X X)1 R )1 (Rb q)
Js2
2
(Rb q) (Rs (X X)1 R )1 (Rb q)
=
J
F =
The corresponding Wald statistic is (see again lecture 3)

W = (Rb q) (RV {b}R )1 (Rb q)
Since the standard estimate for the covariance matrix V {b} under the assumption
N (0, 2 I) is s2 (X X)1 , we see that F and W only dier by the factor 1/J. Moreover,
under H0 F FNJ K and W
2J . Using the denition of the F distribution (see

asympt.
lecture 1) we can write F as

F =
J /J
,
N K /N K
thus as a ratio of 2 distributed random variables. For large N , we can use the result that
due to the denition of the 2 distribution (see lecture 1) and the Law of Large Numbers
N K
2
N K
i=1 Ui N
=
E{Ui2 } = V {Ui } = 1
N K
N K
so that
JF
asympt.
We reject the null hypothesis if F > FNJ K;1 or, equivalently, if JF > JFNJ K;1 . As
we have shown, the null distribution of JF converges to a 2J distribution as N
so that asymptotically JFNJ K;1 = 2J;1 . Since the latter is the critical value of the
Wald test and JF = W , the F-test and the Wald test are asymptotically equivalent and
consequently do not dier much in large samples. In small samples, the F-test can be
shown to be more conservative as there can be cases where the Wald test rejects H0 and
the F-test does not but not vice versa.
General information
This week we treat the problem of heteroscedasticity, so a violation of the assumption

of constant error variances. We study the consequences for estimation and inference
using OLS, more ecient estimation by generalized least squares and how to test for
heteroscedasticity.
Reading
Obligatory reading: Verbeek, Sections 4.1-4.5

Additional reading: Wooldridge, Chapter 8.
Greene, Chapter 8.
Exercises
1)
Consider the model
yi = 1 + 2 xi2 + i
where the error terms are uncorrelated but V {i |X} = 2 x2i2 .
a) Explain how an appropriate Generalized Least Squares estimator can be constructed.
b
Use the notation of the lecture and be specic about your choice of , P, hi and .
b) What is the interpretation of 2 ?
c) How can you test the assumed relation of the error term and xi2 ?
2)
Consider the model
yi = 1 + 2 xi2 + 3 di + i
where di is a dummy variable taking values 0 and 1. We assume that the error terms are
uncorrelated but V {i |X} = 12 if di = 1 and V {i |X} = 22 if di = 0.
1
a) Explain how an appropriate Generalized Least Squares estimator can be constructed.

b
Use the notation of lecture and be specic about your choice of , P, hi and .
b) How can you get standard errors for your GLS estimator?
c) How can you graphically inspect your assumption for the error term?
3)
Use the dataset 401ksubs.gdt which contains the variable net total nancial assets
(nettfa, in $1000s) and several other variables which may explain a personss nancial
wealth. Consider the regression
nettf ai = 1 + 2 inci + 3 agei + 4 age2i + 5 marri + i
a) Estimate the model by OLS.
b) According to your model, at what age is nancial wealth supposed to be lowest?
c) Test for heteroskedasticity using the White test and the Breusch-Pagan test.
d) Given your results from part c), does it make sense to perform additionally the
Goldfeld-Quandt test based on subsamples for married and unmarried persons?
e) Use your insights from part c) to construct an appropriate Weighted Least Squares
estimator of the model. Compare the parameter estimates and standard errors with
the OLS approach.
f) Apply heteroskedasticity-robust standard errors to your OLS estimator and to the
WLS estimator as well. Why may the latter be sensible?
1)
Which of the following are consequences of heteroscedasticity?
a) OLS is inconsistent.
b) OLS is biased.
c) OLS is inecient.
d) s2 (X X)1 is an inconsistent estimate of the covariance matrix of the OLS estimator.
e) The t and F tests as presented in lecture 3 are no longer valid.
Solution to last weeks exercises
1)
a) Generally, the elasticity of a variable y with respect to x is dened as
el :=
y/y
y x
=
x/x
x y
Hence the elasticity can be interpreted as the approximate percentage change of y

for each 1% change of x.
For model (1),
yi log yi /yi
log yi
yi xi
=
xi yi
xi log xi /xi
log xi
so that we obtain a constant elasticity of el1 = 2 . Note that for this interpretation
we must use the natural logarithm.
el1 =
b) In case of model (2) the elasticity corresponds to

el2 =
yi xi
xi
= 2
xi yi
yi
That means the elasticity is not constant and instead depends on the current values
of x and y.
c) For model (3),
log(yi )
log(yi ) yi
yi /yi
=
=
xi
yi xi
xi
which is called the semi-elasticity of y with respect to x. In words, the semielasticity is the approximate percentage change of y given a one unit increase of x.
(For further practice you might calculate the elasticity for model (3) as well.)
2 =
2)
a) It would not be sensible to include the amount of beer consumption as an explanatory variable although it may be signicant and increase the goodness-of-t of your
regression. If you include beer consumption the coecient of beer tax gives you the
eect of the beer tax on trac fatalities, holding beer consumption constant. Since a
beer tax is only supposed to work if it reduces beer consumption, this does not make
sense. In contrast, including the miles driven in a region is sensible. It is likely to
be an important explanatory variable and it still allows the desired interpretation
of the beer tax coecient.
b) Including the number of doctor visits as a control variable is not sensible although
doctor visits are likely to be highly signicant. This is because you would estimate
the eect of pesticide usage on health expenditures, holding the number of doctor
visits constant. Thus, you would only estimate the eect of pesticide usage on
health expenditures that did not arise together with doctor visits. This is probably
not what you are interested in.
3
3)
a) The earnings of females are estimated to be on average 44% lower for females than
for males. More formally, 0.44 is the semi-elasticity of earnings with respect to
gender (see exercise 1).
b) A 10% increase of the market value of a rm is estimated to increase the earnings
of top executives by 3.7%, ceteris paribus. The coecient of Female is now lower in
absolute terms implying 28% less earnings for females holding the other variables,
especially market value, constant. Note that the regression from part a) suers from
omitted variable bias if i) gender is correlated with the omitted variables like, for
instance, log(MarketValue), and ii) the omitted variables have non-zero coecients.
Since log(MarketValue) is highly signicant, the omitted variable bias is likely to
occur due to a negative correlation of Female and log(MarketValue), i.e. larger rms
having less female top executives.
c) Since the coecient of log(MarketValue) is an elasticity it has no dimension, so that
the interpretation would not change. Further, it is possible to show that running the
regression with market value measured in billions would only change the intercept
(here it would change by 0.37 log(1000)) and leave all other coecients unchanged.
4)
a) The coecient of Age is 0.0244429. Since it represents a semi-elasticity, this means
that increasing Age by one unit, for instance from 33 to 34, is expected to lead to
a 2.4% increase in hourly earnings.
b) The coecient of log(Age) is 0.724697 giving the estimated elasticity of earnings
with respect to age. From 33 to 34, age increases by about 3%, so that the expected
increase in earnings is 0.03 0.724697 = 0.02196, i.e. about 2.2%.
c) When Age increases from 33 to 34, the predicted change in log(AHE) is
(0.147045 34 0.00207056 342 ) (0.147045 33 0.00207056 332 ) = 0.0083 ,
meaning a 0.83% increase in earnings.
d) The regressions from a) and b) contain the same number of parameters and can
thus be compared by their goodness-of-t as measured by the R2 . Thus, we would
prefer b) which has a slightly higher R2 (0.192685 vs. 0.192372).
e) The regression from c) is the same as a) but augmented with Age2 . Thus, the R2
must be higher for c) but this does not necessarily mean that we prefer c). However,
since Age2 is highly signicant (p-value of 0.0031) and the Akaike Information
Criterion decreases from 10163.48 to 10156.74, we indeed prefer model c).
f) Noting that now no model is nested in the other one, we stick again to the AIC.
Since 10156.74 < 10160.39, we prefer again model c).
4
g) For females with a Bachelor degree, the regression relation is

\ = 0.0587332 0.179787 1 + 0.405077 1 + 0.147045Age 0.00207056Age2
log(AHE)
\ = 0.2840232 + 0.147045Age 0.00207056Age2
log(AHE)
\ = exp(0.2840232 + 0.147045Age 0.00207056Age2 + 0.5 0.4568762 )
AHE
= exp(0.388391039 + 0.147045Age 0.00207056Age2 )
Note that we added half the estimated error variance for predicting AHE (see slide
9, lecture 4). In GRETL, go to Tools Plot a curve, and use the formula
given above, i.e. exp(0.388391039 + 0.147045 x 0.00207056 x2 ) and specify a
reasonable range for x (age), for instance 20-60.
1)
Two explanatory variables may have low t-statistics if they are highly correlated, even if
at least one of their true coecients is nonzero. A joint test (F-test) takes this correlation
into account. Put dierently, if you drop one of the two variables from the model, the
remaining one may become (highly) signicant and you will not observe this if you remove
two variables at once. As an example, assume you have both a short-term and a longterm interest rate in a model explaining investments. Given the high correlation between
the two interest rates, both may have fairly low t-statistics. However, if you drop one of
them, the remaining interest rate will pick up much of the explanatory power of the two,
and (probably) will be statistically signicant.
2)
The teacher has the hypothesis that there is no linear eect of class size on student
performance. Rather, there are three categories (<20, 20-25, >25), which have dierent
eects but without any eects of changes within a category. Such a relationship can be
modelled using dummy variables. In this case, dene
d1 = 1 if 20 class size 25 , and d1 = 0 otherwise
d2 = 1 if class size > 25 , and d2 = 0 otherwise
Making a linear regression with these dummy variables gives coecients with the following
interpretation. The coecient of d1 is the ceteris paribus eect of increasing class size
from below 20 (the base category) to 20-25 on test scores. The coecient of d2 is the
eect of increasing class size from below 20 to above 25 on test scores. In practice, you
should compare the goodness-of-t of such a dummy variable specication with other
alternatives like the simple linear specication.
General information
This week we treat a second common violation of the standard assumptions, namely
autocorrelation. As last week, we look at consequences for OLS, alternative estimation,
tests for autocorrelation and how to compute robust standard errors.
Reading
Obligatory reading: Verbeek, Sections 4.6-4.11

Greene, Some parts of Chapter 19.
Exercises
1)
The plots on the following page were generated from a rst-order autoregressive processes
of the following form:
t = t1 + t , t = 1, . . . , 100 , t N (0, 1)
The choices for are -0.9, 0 and 0.9. Which plot refers to which value of ?
2)
Verify the approximation of the Durbin-Watson statistic given in the lecture,
dw 2 2b
where b is an estimate of the rst-order autocorrelation of the error terms.
3)
a) Explain why autocorrelation may arise because of an omitted variable. Give examples.
b) Imagine you have a linear regression model with the dependent variable being annual stock returns observed at a monthly frequency. What are the consequences?
1
2
1
0
1
2
3
10
25
40
55
70
85
100
10
25
40
55
70
85
100
10
25
40
55
70
85
100
2 0
4)
Load the dataset gasoline.gdt. Consider the following two regressions which both try to
explain per capita gasoline consumption in the US for the years 1960-1995:
log(Gt /P opt ) =1 + 2 log(P gt ) + 3 log(Yt ) + t
log(Gt /P opt ) =1 + 2 log(P gt ) + 3 log(Yt ) + 4 log(P nct ) + 5 log(P uct )+
6 log(P ptt ) + 7 log(P dt ) + 8 log(P nt ) + 9 log(P st ) + 10 t + t
a) Estimate both models by OLS and plot the residuals against time. What do these
graphs tell about possible autocorrelation in each of the two models?
b) Test for autocorrelation by i) regressing the residuals on their rst lag and applying
a t-test, ii) using the Durbin-Watson test and iii) using the Breusch-Godfrey-test
with up to 3 lags. Do the results t to your interpretation in part a)?
c) Apply Newey-West (HAC) standard errors to the augmented model.
5)
Load the dataset ukrates.gdt which is a time-series dataset consisting of monthly shortterm (variable rs) and long-term interest rates (variable r20) on U.K. government securities. Consider the regression
rst = 1 + 2 r20t1 + t
where rst = rst rst1 and r20t1 = r20t1 r20t2 . The model can be interpreted
as a simple monetary policy reaction function.
a) Estimate the model by Ordinary Least Squares and make tests for rst-order autocorrelation using i) a regression of the residuals upon their rst lag and ii) the
Durbin-Watson test.
b) Re-estimate your model using Newey-West (HAC) standard errors.
c) Re-estimate your model by applying the Feasible Generalized Least Squares estimator by Cochrane and Orcutt and interpret the dierences.
d) Re-estimate your model by applying the Prais-Winsten estimator which is identical
to the Cochrane-Orcutt procedure but additionaly uses the rst observation.
1)
Which of the following are consequences of autocorrelation?
a) OLS is inconsistent.
3
b) OLS is biased.
c) OLS is inecient.
d) s2 (X X)1 is an inconsistent estimate of the covariance matrix of the OLS estimator.
e) The t and F tests as presented in lecture 3 are no longer valid.
f) The model is misspecied.
1)
a) Our assumption is that
V {|X} = 2 Diag{x2i2 } = 2
Thus,
x212
0
..
x2N 2
0
Therefore, hi = xi2 .
Since 1 = P P ,
P = Diag{x1
i2 }.
Applying the transformation matrix P to y and X gives the transformed model

P y = P X + P
yi
1
i
=
+ 2 +
xi2
xi2
xi2
yi = 1 xi1 + 2 xi2 + i
, i = 1, . . . , N
, i = 1, . . . , N
The GLS estimator can be written as

=
(N
1
)1
x x
2 i i
i=1 xi2
or
=
(N
x2i2
xi y i
xi yi
i=1
i=1
where
i=1
)1
xi xi
x = (xi1 , xi2 ) .
b) Since
i
1
2 x2
|X} = 2 V {i |X} = 2 i2 = 2 ,
xi2
xi2
xi2
2 is the variance of the error in the transformed model and not equal to the variance
in the original model.
V {i |X} = V {
c) Since the assumption is that the variance of the errors is proportional to xi2 ,
V {i |X} = E{2i |X} = 2 x2i2
we can simply use estimates of 2i , for instance squared residuals taken from OLS,
to perform the auxiliary regression
2i = 1 + 2 x2i2
, i = 1, . . . , N
if our assumption is correct, 1 = 0 and 2 = 0 which can be tested seperately by

t-tests.
5
2)
a) Denote the number of observations where di = 1 as N1 and let N2 be the number
of observations with di = 0. Then, our assumption is that
12
2
2
2
V {|X} = =
..
0
12
2
22
2
..
2
2
0
|
{z
}|
{z
N1 colums
Thus hi = 1 for i = 1, . . . , N1 and hi =

The transformation matrix P is
P =
for i = N1 + 1, . . . , N .
..
The GLS estimator is

N1
N1
=
x
x
+
x x
i
i
2
2 i i
i=1 1
j=N1 +1 2
N1
N1
=
V {}
2 (X 1 X)1 ,
and
1
1 (y X )
(y X )
N 3
2
12
0
..
.
2
12
2
22
..
.
2
22
0
6
xy +
xy
2 i i
2 i i
i=1 1
j=N1 +1 2
b) The covariance matrix of can be estimated as
2 =
x
y
+
xy
i
i
2
2 i i
i=1 1
j=N1 +1 2
1
x x +
x x
=
2 i i
2 i i
i=1 1
j=N1 +1 2
where
N2 colums
0
..
c) Before one would apply the GLS approach shown above, one should inspect the
underlying assumption. This can be done by estimating the model by OLS and
plotting the residuals separately for the subsamples with di = 1 and di = 0 respectively. If you there are too much data for such a plot, a boxplot for the residuals of
both subsamples is preferable.
3)
a) Model 1: OLS, using observations 19275
Dependent variable: nettfa
Coecient
Std. Error
t-ratio
p-value
const
9.18133
10.0688
0.9119 0.3619
inc
1.05132
0.0272112
38.6356 0.0000
age
2.29337
0.488324
4.6964 0.0000
sq age
0.0385523 0.00560576
6.8773 0.0000
marr
10.0488
1.33773
7.5118 0.0000
b) We are looking for a local minimum of our regression function with respect to age.
Taking the partial derivative with respect to age yields the rst-order condition:
!
2.29337 + 2 0.0385523age = 0
Solving this equation leads to age = 29.74. It is easily seen that the second (partial)
derivative is positive so that we have indeed a minimum at age 30.
c) The null hypothesis of homoskedasticity is clearly rejected by both the White test
and the Breusch-Pagan test. Age and income especially seem to be sources of
heteroskedasticity.
d) No. One the one
riage is likely to
Goldfeld-Quandt
and neglects any
in our case.
hand, we have already seen in the Breusch-Pagan test that marbe another source of heteroskedasticity. More importantly, the
test considers only the heteroskedasticity induced by marriage
other sources like age and income which are apparently existent
e) The regression from the Breusch-Pagan test seems to be a sensible starting point for
a Weighted Least Squares approach. However, if we would directly use it we would
have no guarantee to get positive variance estimates and thus could get negative
weights. Instead, we might consider the same variables but using the multiplicative
model presented in the lecture. The corresponding auxiliary regression is
log(e2i ) = 1 + 2 inci + 3 agei + 4 age2i + 5 marri + vi
To do so in GRETL, go to Save Squared residuals in the model window which
generates the corresponding new variable. Taking the log of the squared residuals
and performing the auxiliary regression we get the interesting result that marriage
is not a signicant source of heteroskedasticity anymore. Dropping marri gives the
following results:
7
Model 3: OLS, using observations 19275

Dependent variable: l usq2
Coecient
Std. Error
t-ratio
p-value
const
5.24314
0.390843
13.4149 0.0000
inc
0.0426766 0.000988221 43.1853 0.0000
age
0.153820
0.0189863
8.1016 0.0000
sq age
0.00243750 0.000217944 11.1841 0.0000
Using the predictions from this auxiliary regression ( Save Fitted values) and
reversing the logarithmic transformation we have
b 2 = exp(5.24314 + 0.0426766inc 0.153820age + 0.00243750age2 )
h
i
i
i
i
b 1 since including it means multiNote that it does not matter if we keep or drop
plying all observations with a constant which does not change the result. For WLS
estimation, go in GRETL to Models Other linear models Weighted Least
b 2 as the weight variable (see Help for details). Doing so
Squares and choose 1/h
i
gives
Model 4: WLS, using observations 19275

Dependent variable: nettfa
Variable used as weight: h sq inv
Coecient
const
0.337574
inc
0.550563
age
0.840889
sq age
0.0167648
marr
3.93802
Std. Error
t-ratio
p-value
5.19623
0.0650 0.9482
0.0225816
24.3810 0.0000
0.271119
3.1016 0.0019
0.00339816
4.9335 0.0000
0.586034
6.7198 0.0000
The dierence in the parameter estimates is remarkably high. As expected from

theory, the standard errors are lower under the WLS approach.
f) Go to Edit Modify Model and choose robust standard errors. Under Congure you get several options for heteroskedasticity-robust standard errors. Option
HC0 refers to the original White standard errors which you have seen in the lecture. Applying this option to our models increases the standard errors in the OLS
model considerably while the WLS standard errors remain on a similar magnitude.
Nevertheless, using robust standard errors within WLS may be sensible since the
model for the variance of the error term will only be an approximation to reality so
that heteroskedasticity may well still be present.
Solution to additional exercises
1)
8
a) False. Homoskedasticity is not required for the consistency of OLS.

b) False. Homoskedasticity is not required for unbiasedness of OLS.
c) True. If the error terms are heteroskedastic, the WLS estimator is more ecient in
theory. This, however, only holds if the variances of the error terms are known. In
practice, we have to specify a model for the variance of the error term and estimate
it. Then, there is no guarantee that WLS is more ecient. However, WLS will be
often be more ecient especially if the degree of heteroskedasticity is high.
d) True. In the derivation of this formula we used the assumption of homoskedasticity.
e) True. Although there are certain departures from homoskedasticity where the usual
t- and F-tests are still asymptotically valid, they will be generally invalid. However, the t-statistic with heteroskedasticity-robust standard errors is asymptotically
normal distributed and the Wald test with a heteroskedasticity-robust covariance
matrix can also be used for inference (see the exercises from week 3 for the relation
of F-tests and Wald tests). Note that exact small sample distributions (i.e. t- and
F-distributions) are no longer available under autocorrelation even if we assume
normality of the error term.
General information
This week we review the properties of OLS and the relevant assumptions. We shall see
examples when OLS cannot be saved anymore. The instrumental variables estimator will
be introduced as a solution for these cases.
Reading
Obligatory reading: Verbeek, Sections 5.1-5.3.1

Greene, Chapter 12.
Exercises
1) (adapted from Stock/Watson)

The demand for a commodity is given by Q = 1 +2 P +u, where Q denotes the quantity,
P denotes the the price, and u denotes factors other than price that determine demand.
Supply for the same commodity is given by Q = 1 + 2 P + v, where v denotes factors
other than price that determine supply. Assume that u and v both have a mean of zero,
variances u2 and v2 , and are uncorrelated, i.e. cov{u, v} = 0.
a) Solve the two equations to show how Q and P depend on u and v.
b) Calculate cov{P, u} and cov{P, v} and interpret the results.
c) Derive cov{P, Q} and V {P }.
d) A random sample of observations (Qi , Pi ), i = 1, . . . , N , is collected, and Qi is
regressed on Pi . Use the answer from c) to derive the asymptotic Ordinary Least
Squares regression coecient of Pi .
e) Suppose the OLS estimate from d) is used to estimate the slope of the demand
function, 2 . Is the estimated slope (asymptotically) correct, too large or too small?
1
(Hint: Use the fact that demand curves usually slope down and supply curves slope
up.)
2)
Consider the instrumental variable regression model
yi = 1 + 2 x1i + 3 x2i + i
where x1i is correlated with i and z1i is a potential instrument. Which assumption of
the instrumental variables estimator is not satised if
a) z1i is independent of (yi , x1i , x2i )?
b) z1i = x2i ?
c) z1i = cx1i (where c is a constant) ?
During the 1880s, a cartel known as the Joint Executive Committee (JEC) controlled
the rail transport of grain from the midwest to eastern cities in the United States. The
cartel preceded the Sherman Antitrust Act of 1890, and it legally operated to increase
the price of grain above what would have been the competitive price. From time to time,
cheating by members of the cartel brought about a temporary collapse of the collusive
price-setting agreement.
The data le railway.gdt contains weekly observations on the rail shipping price and other
factors from 1880 to 1886. Suppose that the demand curve for rail transport of grain
is specied as log(Qt ) = 1 + 2 log(Pt ) + 3 Icet + t , where Qt is the total tonnage of
grain shipped in week t, Pt is the price of shipping a ton of grain by rail and Icet is a
binary variable that is equal to 1 if the Great Lakes are not navigable because of ice. Ice
is included because grain could also be transported by ship when the Great Lakes were
navigable. Further, the variable cartel is a dummy variable for the activity of the cartel.
a) Estimate the demand equation by OLS. What is the estimated value of the demand
elasticity and its standard error?
b) In exercise 1 we have analyzed that the interaction of supply and demand is likely to
make the OLS estimator of the elasticity biased. Consider now using the variable
cartel as an instrumental variable for log(P ). Use economic reasoning to argue
whether cartel plausibly satises the two conditions for a valid instrument.
c) Regress log(Pt ) on cartelt and Icet . What do the results tell you about the quality
of cartel as an instrument?
d) Estimate the demand equation by instrumental variable regression. What is the
estimated demand elasticity and its standard error? Compare the results to the
OLS estimates.
e) Perform the Durbin-Wu-Hausman test and interpret the results.
2
1)
Consider the example from slide 3 of lecture 7,
yt = 1 + 2 yt1 + 3 yt2 + t .
Show the result from the lecture, namely that the assumption E{|X} can not hold for
this model. (X denotes as always the matrix of regressors which are here lagged values
of yt .)
1)
First plot: = 0; second plot: = 0.9; third plot: = 0.9.
If the autocorrelation is positive and high ( = 0.9), the process tends to stay above
(below) its mean (zero) in the next period if it is above (below) its mean in the current
period. For = 0.9 the process tends to reverse its sign from one period to another.
2)
T
dw =
t=2
t=1
T
t=2
T
t=1
(et et1 )2
=
(e2t 2et et1 + e2t1 )

T
e2t
T
e2t
+
e2t
t=2
t=1
e2t1
t=2
T
t=1
e2t
et et1
2 t=2T
t=1
e2t
as the sample size becomes large because both t=2

T
e2t
T
et et1
2 2 t=2T
2
et
t=1
e2t
e2
t=1 t
and t=2
T
T
e e
t=2 t t1
is an estimator of , dw tends to 2 2.
Since
T
2
e2t1
e2
t=1 t
tend to 1.
e
t=1 t
It can be shown that this estimate of is very close to the estimate which results from
regressing et on et1 by OLS.
3)
a) See exercise 4. Another example would be the omission of a variable that describes
seasonal patterns in monthly or quarterly data. Consider for instance the case that
you want to explain the activity in the construction industry based on monthly
data. It could then be that the residuals in your model would have a tendency
to be negative in the winter months and positive in the summer months. A winter/summer dummy variable could possibly solve this problem. Another typical
example is an omitted lagged dependent variable.
3
b) Let Rt be the annual return of a stock, i.e. the return over the period [t12, t], where
time is measured in months. As Rt+1 refers to the period [t 11, t + 1], Rt and Rt+1
have 11 months in common and are therefore not stochastically independent and
probably heavily autocorrelated. The autocorrelation of Rt will probably translate
into autocorrelation of the error term in a regression explaining Rt since unexpected
(stock market) events in one month have a direct inuence on Rt in the following
twelve month periods. Under autocorrelation, routinely computed standard errors
and tests will be incorrect and misleading, so that robust standard errors (HAC)
are recommended.
4)
Regression residuals (= observed - fitted lGpc)
0.04
0.06
0.03
0.04
0.02
0.02
0.01
residual
residual
Regression residuals (= observed - fitted lGpc)

0.08
-0.02
-0.01
-0.04
-0.02
-0.06
-0.03
-0.08
-0.04
1960
1965
1970
1975
1980
1985
1990
1995
1960
1965
1970
1975
1980
1985
1990
1995
a) The graph on the left hand side refers to the rst model whereas the graph on the
right hand side refers to the augmented model. The model with more regressors
seems to suer less from autocorrelation since the residuals cross the zero line
more often. This is an example where the omission of relevant variables leads to
autocorrelation.
b) i) Regressing the residuals of our model on their rst lag (use the lags. . . option
in the model specication window to add the lagged residual), we get estimated
rst-order autocorrelations of 0.948469 (p value 2.49e-014) and 0.268612 (p value
0.1147). Thus, we clearly reject the null hypothesis of no autocorrelation for the
rst model whereas we do not reject the null hypothesis for the second model. Of
course, not rejecting the null hypothesis (especially in small samples) does not mean
that the null hypothesis is true.
ii) The Durbin-Watson statistics are 0.172878 and 1.373491, respectively. Looking
at the bounds for the critical values given in the lecture we see that we reject the
null hypothesis clearly for the rst model and that we are in the inconclusive region
for the augmented model (the bounds for K=10 are not too far away from the
bounds for K=9).
iii) Running auxiliary regressions from the residuals on their rst three lags yields
R2 values of 0.847827 and 0.107939. Thus, the Breusch-Godfrey test statistics are
32 0.847827 = 27.13 and 32 0.107939 = 3.454. Using the p value nder of GRETL
and applying 3 degrees of freedom, we get p values of 5.52923e-006 and 0.326771,
respectively.
4
Thus, for each test, the results t to the graphical analysis from part a).
c) Applying Newey-West standard errors yields in our case standard errors which are
lower than the default standard errors. More often, it is the other way around.
5)
a) A regression of the residuals on their rst lag gives a coecient of 0.148841 which
is statistically signicant from zero with a p value of 0.0006. The Durbin-Watson
statistic is 1.702273. Going to Tests Durbin-Watson p value gives 0.000292831,
so that we also reject H0 .
b) The standard errors increase as expected since standard errors that ignore autocorrelation are usually (asymptotically) downward biased. Note that HAC standard
errors are also not unbiased but they are at least asymptotically unbiased in contrast
to non-robust standard errors.
c) Going to Model Time series Cochrane-Orcutt we can perform the Feasible
Generalized Least Squares approach from the lecture. The iterations lead to a
slightly increased estimated (rst-order) autocorrelation. The coecient of r20t1
is now somewhat lower than with OLS estimation. The standard errors tend to be
smaller than in part b) pointing to a more ecient estimation than by doing OLS.
Note that a comparison with unadjusted standard errors (part a)) is not meaningful
since these standard errors are invalid under autocorrelation.
d) Go to Model Time series Prais-Winsten. The results are very similar.
This is no surprise since the information from one additional observation should
not make a big dierence especially if the sample size is relatively large as it is the
case here.
1)
a) False.
b) False.
c) True. If the error terms are correlated, an appropriate GLS estimator is more
ecient in theory. This, however, only holds if the correlations (and the variances)
of the error terms are known. In practice, we have to specify a model for the
autocorrelations of the error term and estimate it. Then, there is no guarantee that
WLS is more ecient. However, FGLS will be often be more ecient especially if
the degree of autocorrelation is high.
d) True. In the derivation of this formula we used the assumption of uncorrelated
error terms.
5
e) True. However, the t-statistic with HAC standard errors is asymptotically normal
distributed and the Wald test with a HAC covariance matrix can also be used for
inference. Exact small sample distributions (i.e. t- and F-distributions) are no
longer available under heteroskedasticity even if we assume normality of the error
term.
f) Depends. A high degree of autocorrelation may point to misspecication but there
is no general rule that there must be misspecication. In practice, we should just
check the functional form and test additional regressors if these are available.
General information
This week we continue studying instrumental variables estimators in the various situation
when they are needed. We then generalize the estimator and look at specication tests.
Reading
Obligatory reading: Verbeek, Sections 5.3.2-5.5

Greene, Chapter 12.
Exercises
1)
Consider the following two equations:
yi = 1 + 2 xi + ui
xi = 1 + 2 yi + vi
ui and vi are the error terms of the models and are assumed to be uncorrelated with each
other having variances u2 > 0 and v2 > 0.
a) Show that xi is correlated with ui if 2 = 0.
b) What are the consequences of your nding?
2)
Consider the equations
yi = 1 + 2 x1i + 3 z2i + 4 z3i + ui
xi = 1 + 2 yi + 3 z2i + vi
and assume that cov{z2i , ui } = cov{z3i , ui } = cov{z2i , vi } = cov{z3i , vi } = 0.
1
a) What do the assumptions mean?

b) Can we consistently estimate 2 ?
c) Can we consistently estimate 2 ?
3)
a) Briey describe the Two-stage Least Squares (2SLS) approach.
b) Show that the 2SLS estimator derived in the lecture is identical to the (generalized)
Instrumental Variables estimator. Consider the case of overidentication as well as
the case of exact identication.
4)
Consider again the dataset from last week (railway.gdt) and the corresponding regression
log(Qt ) = 1 + 2 log(Pt ) + 3 Icet + t
a) Use cartelt , cartelt1 and cartelt2 as instruments for log(Pt ) and estimate the
model with the instrumental variables estimator.
b) Why may it be sensible to use cartelt1 and cartelt2 as additional instruments?
c) Given your reasoning from part b), how do you interpret the results from part a)?
d) What is the risk of using additional instruments in general? What about the specic
case in this exercise?
e) Check your reasoning from part d) by performing the specication test from the
end of lecture 8.
f) Re-estimate the model by applying Newey-West (HAC) standard errors. (These
are easily generalized from the OLS case given in the lecture to IV regressions.)
g) Test that the demand elasticity is equal to -1.
1)
Why does the Instrumental Variable (IV) estimator lead to a smaller R2 than the OLS
one? What does this say of the R2 as a measure for the adequacy of the model?

a)
(1) Q = 1 + 2 P + u
(2) Q = 1 + 2 P + v P =
1
1
1
+ Q v
2 2
2
Substituting P in (1):
1
1
1
+ Q v) + u | 2
2 2
2
2 Q = 2 1 2 1 + 2 Q 2 v + 2 u
2 1 2 1 2 u 2 v
Q =
+
2 2
2 2
Substituting Q in (2):
1
1
1
P = + (1 + 2 P + u) v | 2
2 2
2
2 P = 1 + 1 + 2 P + u v
1 1
uv
P =
+
2 2 2 2
Q = 1 + 2 (
b)
(
1 1
uv
cov(P, u) = cov
+
,u
2 2 2 2
1
=
cov(u v, u)
2 2
V (u) cov(u, v)
{z
}
|
2 2
=0
u2
= 0
2 2
(
)
1 1
uv
cov(P, v) = cov
+
,v
2 2 2 2
1
cov(u v, v)
=
2 2
=
cov(u, v) V (v)
2 2 | {z }
=0
v2
2 2
= 0
Interpretation: Since cov(P, u) = 0 and cov(P, v) = 0, the regressors are correlated

with the error terms so that the conditions for the consistency of OLS are not met.
c)
(
1 1
u v 2 1 1 2 2 u 2 v
cov(P, Q) = cov
+
,
+
2 2 2 2
2 2
2 2
1
=
cov(u v, 2 u 2 v)
(2 2 )2
=
cov(u, 2 u) cov(u, 2 v) cov(v, 2 u) +cov(v, 2 v)
2
{z
}
|
{z
}
|
(2 2 )
=0
=0
2 u2 + 2 v2
=
(2 2 )2
(
)
1 1
uv
V (P ) = V
+
2 2 2 2
V (u) + V (v) 2 cov(u, v)

2
|
{z
}
(2 2 )
=0
2 + v2
= u
(2 2 )2
d)
2,OLS =
=
=
i=1 (Qi Q)(Pi P )

n
2
i=1 (Pi P )
n
1
i=1 (Qi Q)(Pi P )
n
1 n
2
i=1 (Pi P )
n
d
cov(Q,
P ) n cov(Q, P ) c) 2 u2 + 2 v2
=
= 2
V (P )
u2 + v2
Vb (P )
e)
2
2
u2 + v2
(2 2 )u2
n 2 u + 2 v
2,OLS 2
=
> 0 , if 2 > 0 and 2 < 0
2 2
u2 + v2
u + v2
u2 + v2
Thus, 2 would be overestimated (underestimated in absolute terms) by OLS.

2)
a) z1i is not a relevant instrument because it is not correlated with x1i .
b) The model is not identied because the moment condition for the instrument,
E{(yi 1 2 x1i x2i )z1i } = 0 is identical to the moment condition of x2i ,
E{(yi 1 2 x1i x2i )x2i } = 0.
c) z1i is not exogenous because it is also correlated with i .
(cov{z1i , i } = cov{cx1i , i } = c cov{x1i , i }).
4
3)
a) The estimated demand elasticity is -0.63433 with a standard error of 0.0819697.
b) The variable cartel can be reasonably assumed to be relevant for the supply side
only. That means, cartel is likely to be uncorrelated with any demand shocks and
thus uncorrelated with the error term of the demand equation. Further, cartel
should be correlated with prices since usually cartels use their power to raise prices.
Thus, both conditions, exogeneity and relevance, for a valid instrument are pretty
likely to be met in this case.
c) With a t-ratio of 14.7 cartel is highly signicant and seems to have important
explanatory power with respect to prices. This result supports our assumption
that cartel is a relevant instrument.
d) Go to Model Instrumental variables Two-stage least squares (which is
another name for the IV estimator given in the lecture) and specify log(Qt ) as
the dependent variable, log(Pt ) and Icet as independent variables and cartelt and
Icet as instruments. The demand elasticity is now estimated to be -0.872271 with
a standard error of 0.131355. In line with the results from exercise 1, the result
points to an overestimation of the demand elasticity if OLS is applied (In exercise
1 we did not use logarithmic transformations and no additional covariate but the
results would be similar). Further, the IV estimate is closer to -1 which would be
the demand elasticity expected from economic theory in monopolies. Finally, the
standard error of the IV estimator is larger than that of OLS. This is also expected
from theory and does not tell us that OLS should be preferred.
e) Start with the regression from part c) and save the residuals. Then, estimate the
original model by OLS as in a) but add the residuals from the auxiliary regression.
Under the null hypothesis, that both OLS and IV are consistent, the coeent of
this new variable should be zero. However, we get a coecient of 0.396247 with
a t-ratio of 2.385 so that we reject the null hypothesis at the 5% level. Note that
in the Durbin-Wu-Hausman test the alternative hypothesis is inconsistency of OLS
but consistency of IV. Thus, consistency of IV is assumed and not tested.
1)
Note that E{|X} = 0 means
E
..
x11
..
.
. . . x1K
xN 1 . . . x N K
=0
or E{i |xjk } = 0 for every i, j = 1, . . . , N and k = 1, . . . , K. Further note that conditional

mean independence implies uncorrelatedness, i.e.
E{i |xik } = 0 cov{i , xjk } = 0.
5
In our model we have

cov{yt1 , t1 } = cov{1 + 2 yt2 + 3 yt3 + t1 , t1 }
= V {t1 } = 0
if we assume contemporaneous uncorrelatedness of the regressors with the error term, i.e.
cov{yt1 , t } = cov{yt2 , t } = 0.
Consequently, the assumption E{|X} = 0 (which is necessary for unbiasedness) is not
met in this case whereas E{i xi } = 0 (necessary for consistency) might still hold.
General information
This week the method of maximum likelihood estimation is introduced. This estimation
technique is used for a wide range of econometric models and is therefore extremely useful.
Binary choice models are also treated. Further models that required maximum likelihood
estimation are covered in the following week.
Reading
Obligatory reading: Verbeek, 6.1, 7.1.1-7.1.4

Additional reading:
Wooldridge, Relevant parts of Chapter 17
Greene, Relevant parts of Chapter 16 and 23
Exercises
1)
Consider a random variable X with an exponential distribution. The corresponding
density function is given by ( > 0):
f (x) =
ex
0
x0
x<0
a) Consider a sample of X, denoted by x1 , . . . , xN . Find the maximum likelihood

estimator of .
b using
b) Estimate the asymptotic variance of the maximum likelihood estimator
both approaches given in the lecture (the estimators are denoted by VbH and VbG ,
respectively, in the lecture). Are the two estimators dierent asymptotically? (Hint:
For the last question use the fact E{X} = 1/ and V {X} = 1/2 .)
2)
Using the expressions from the score contributions on slide 16 of lecture 9, to derive the
information matrix, the asymptotic covariance matrix and an estimator for the variance
of b in large samples for the case that b is the maximum likelihood estimator in the linear
regression model with normally, independently and identically distributed errors. (Hint:
Use the fact that for a normally distributed random variable X with zero mean it holds
that E{X 4 } = 3 4 , where 2 is the variance of X.)
The dataset insurance.gdt provides data about the insurance status of U.S. citizens represented by the binary variable insured and further covariates.
a) Regress insured on the variables healthy (a binary variable regarding self reported
health status), age, male, married and self emp (binary variable; 1 = self employed) using (i) the Logit and (ii) the Probit specication. Compare the results.
Which specication would you prefer?
b) Compute the marginal eects of your regression evaluated at the mean values of
the regressors. Compare again the Logit and the Probit model.
c) Compute the estimated probabilities that a healthy and married man, aged 40, is
insured if he is (i) self-employed and (ii) not self-employed. Interpret the dierence
between (i) and (ii).
d) Imagine that people who are uninsured tend to start self-employment (maybe because they are jobless). Would it then be correct to interpret the marginal eect
from part c) as a causal eect?
e) Augment your model by using age2 as an additional regressor. Does this improve
your model? Compute the marginal eect of age.
f) Include the additional regressor male married. For which persons is the variable
equal to one? Is the eect of marriage on the probability to be insured higher for
men or for women?
1)
a)
cov(ui , xi ) = cov(ui , 1 + 2 yi + vi )
= cov(ui , 2 yi )
= 2 cov(ui , 1 + 2 xi + ui )
= 2 (2 cov(ui , xi ) + V ar(ui ))
= 2 2 cov(ui , xi ) + 2 u2
cov(ui , xi ) 2 2 cov(ui , xi ) = 2 u2
2 u2
cov(ui , xi ) =
1 2 2
b) The result from part a) shows that xi is endogenous so that the fundamental assumption for the consistency of Ordinary Least Squares (OLS) is not met. Thus,
OLS is not an appropriate estimator in this case.
2)
a) The assumptions mean that the variables z2 and z3 are exogenous for both equations, i.e. we can think of z2 and z3 as being generated from outside the model.
b) No. OLS would be inconsistent because of simultaneous equations bias which can
be derived similarly to exercise 1. We could estimate 1 consistently if we would
have a valid instrument for x. However, the variables which are relevant for x
according to the seccond equation, y and z2 , can not serve as instruments since y
is, of course, endogenous and z2 is already included in the rst equation so that it
does not provide an additional moment condition.
c) Yes. Although OLS in inconsistent for the same reasons as in b), we could use z3
as an instrument for y and then consistently estimate 2 .
3)
a) The rst stage of the 2SLS approach is an OLS regression of all endogenous regressors on all instruments. For instance, if we have a single endogenous regressor, say
xk , there is one rst-stage regression where xk is regressed on the other explanatory
variables of the original equation (1, x2 , . . . , xk1 , xk+1 , . . . , xK ) and the excluded instruments. The vector of predicted values from the rst-stage regression, xk = Z
(where Z is the matrix containing the exogenous explanatory variables and the excluded instruments and
are the OLS estimates from the rst-stage regression), is
then substituted for xk in the original equation and the modied original equation
is then estimated by OLS.
3
b) From slide 30 in lecture 8:

X)
1 X
y
IV = (X
= ((Z(Z Z)1 Z X) Z(Z Z)1 Z X)1 (Z(Z Z)1 Z X) y
= (X Z(Z Z)1 Z Z(Z Z)1 Z X)1 X Z(Z Z)1 Z y

= (X Z(Z Z)1 Z X)1 X Z(Z Z)1 Z y
If R = K, Z X (and X Z) is a square matrix and if we assume the regulatory
condition that Z X is invertible (similar to the required existence of (X X)1 for
OLS) we have
IV = (Z X)1 Z Z(X Z)1 X Z(Z Z)1 Z y
= (Z X)1 Z Z(Z Z)1 Z y
= (Z X)1 Z y
4)
a) The coecient for the demand elasticity is now -0.985085. (Last week we have
estimated demand elasticites of -0.872271 by IV without cartelt1 and cartelt2
and -0.63433 by OLS.)
b) In general, additional instruments can help to improve the eciency of the IV
estimator, i.e. they may reduce the variance of the estimator.
c) As expected from part b), the standard error of the demand elasticity is now
0.119144 compared to 0.131355 when cartelt1 and cartelt2 are not included.
d) The risk of using additional instruments is that the exogeneity assumption is not
met for the additional instruments. Then, the IV estimator is not consistent. In our
case, it is hard to argue that cartelt1 and cartelt2 are not exogenous if we argue
that cartelt is exogenous. Since such reasoning is often sensible, lagged instruments
are quite often applied in practice.
e) Save the residuals from your model and regress them by OLS on the full set of
instruments, i.e. a constant, Icet , cartelt , cartelt1 and cartelt2 . The R2 from
this regression is 0.016274, so that our test statistic is 326 0.016274 = 5.305. The
test statistic is 2 distributed with R K = 2 degrees of freedom under the null
hypothesis. Using GRETLs p value nder we get a value 0.0704748 so that we do
not reject the null hypothesis at a signicance level of 5%. Thus, there is no strong
evidence against the exogeneity of our additional instruments.
f) Applying Newey-West standard errors increases the standard error of the demand
elasticity substantially to 0.212288. This is not surprising given that the DurbinWatson statistic is 0.461790 so that the rst-order autocorrelation is about 1
0.461790/2 = 0.769 (see exercise 2 from week 6 and lecture 6, slide 15) pointing to
considerable autocorrelation.
4
g) Since we have evidence for autocorrelation we should not use t or F distributions but
b
we can use the asymptotic result that under the null hypothesis t = 2 +1
b N (0, 1)
se(2 )
or, equivalently, t
(the latter is a Wald test). Doing this in GRETL ( Tests
Linear restrictions b[2] = 1) gives a p value of 0.943987 so that the null
hypothesis is not rejected.
2
21
1)
OLS minimizes the residual sum of squares and therefore maximizes the R2 . Any other
estimator, including instrumental variables, results in a lower R2 . Note that we are often
not interested in obtaining an R2 that is as high a possible, but in obtaining consistent
estimates for the coecients of interest that are as accurate as possible. The R2 does not
tell us which estimator is the preferred one. The R2 tells us how well the model ts the
data (in a given sample) and typically is only interpreted in this way when the model is
estimated by OLS.
This week a number of topics will be treated in somewhat less detail. First, binary choice
models will be studied in more detail with emphasis on the measurement of the goodness
of fit and the interpretation of an underlying latent variable. This interpretation becomes
useful for other models introduced this week. Next, the model is extended to multiple
possible outcomes, but we restrict attention to ordered probit and logit models, where one
needs to assume a natural ordering of the variables. The more general case of multinomial
models is not treated, but the interested student may read the relevant section in the
book. Furthermore, so called count data models are treated, that allow modeling the
number of certain events. Finally, the important topic of censored regression with the
tobit model will be treated briefly.
The treatment of these topics is necessarily somewhat superficial. However, it is important to know they exist and what the underlying principles are. If you are interested
in more detail you may consider reading additional sections in Verbeek or look at the
treatment of these models in other textbooks (which may be much better and more
detailed).
2 Reading
Obligatory reading: Verbeek, 7.1.5-7.1.6, 7.2.1-7.2.4, 7.3 and 7.4.1-7.4.3. The part on the
truncated regression on page 234 may be skipped.
Furthermore, you also have to read page 433 in Appendix B.
Additional reading: As usually, the treatment in the other books is different from Verbeek,
so if you are interested read:
in Wooldridge, the relevant parts of Chapter 17
and
in Greene, the relevant parts of Chapter 16 and 23
3 Exercises
1) (adapted from Greene)
Consider the number of tickets demanded for events in a certain sports arena. Denote
this variable by Y . What is actually observed is the number of tickets sold (Y ) which
is equal to Y if the arena is not sold out and equal to 20000, the maximum capacity,
otherwise. Suppose that the mean number of tickets sold is 18000 and that the arena is
sold out for 25% of all events. Calculate the mean number of tickets demanded
under the

((a)/)
assumption that Y N (, 2 ). (Hints: Use that E[Y |Y < a] = + ((a)/)
and that (0.675)/(0.675) = 0.424.)
2)
The dataset credit.gdt (see Verbeek, 7.2.3) is a sample of US firms in 2005 containing
their Standard & Poors credit ratings and several explanatory variables. (There seems
to be an error in the formulas in Verbeek, p. 217, where the expressions on the right
hand side lack a 1 in front of them.)
a) Use an Ordered Logit model to regress rating on booklev (book leverage) and
ebit (earnings before interest and taxes/total assets). Interpret the coefficients.
b) What is the probability that a firm with 50% book leverage and 10% ebit has a
rating of 4?
c) What is the probability that a firm with 50% book leverage and 10% ebit has a
rating of 4 or more?
d) Estimate a Logit model using the binary dependent variable invgrade (Investment
grade rating) and the same regressors as before. Compare the results. What is your
answer for c) under this model? (A rating of 4 or more is defined to be investment
grade).
3)
Use the dataset patents.gdt (see Verbeek, 7.3.2) which is a sample of 181 international
manufacturing firms containing data on the number of patent applications (patents),
the expenditures for research and development (R&D) and several other variables.
a) Regress patents on the logarithm of R&D expenditures, the dummy variables for
the different industries (the reference category is food, fuel, metal and others) and
the dummy variables for the country (the reference category is Europe) by using
(i) the Poisson model and (ii) the negative binomial model (NegBin II). Use the
option for robust standard errors.
b) Test for overdispersion in the Poisson model.
c) Interpret the coefficient of log(R&D) in the negative binomial model.
d) Interpret the coefficient of the USA dummy in the negative binomial model.
2
e) Estimate a linear model instead (by OLS and with patents and not log(patents) as
the dependent variable) and use heteroscedasticity-robust standard errors. (i) Show
that homoscedasticity can be excluded if the Poisson model is correctly specified.
(ii) What is the elasticity of the number of patents with respect to R&D expenditures according to the linear model for a US firm in the computers industry with
R&D expenditures of 500?
f) Plot the actual values for the number of patents against the predicted values (i)
from the negative binomial model and (ii) from the linear model. Compare and
interpret the graphs. Relate your interpretation to the log likelihood values of both
models.
4 Solution to last weeks exercises

1)
a) Since we will only observe non-negative values for X the likelihood contribution of
the observation xi , i = 1, . . . , N , is just the exponential density for non-negative
values evaluated at xi :
log L =
N
X
log(exi )
N
X
log() xi
i=1
i=1
= N log()
N
X
xi
i=1
log L
N
=
N
X
N
=
xi = 0
i=1
N
X
xi
i=1
1
=x
= 1
As we will see in part b) the second derivative of log L is always negative in our
case so that a maximum is attained.
b)
log Li = log() xi
1
log Li
= xi = si ()
2
log Li
1
= 2
2

!1
N
X
1
1
VH =
2
N i=1
2
=
!1
N
X
1
2
si ()
VG =
N i=1
2 !1
N
1 X 1
xi
=
i=1
1
1
=
= 2 = V
2
E{(xi E{xi }) }
V ar{xi }
Thus, there is no difference between VbH q

and VbG asymptotically.
Note that asympq
b are given by VbH /N and VbH /N since N (

b )
totic standard errors for
N (0, V ) (compare also to week 3, exercise 1).
2)
The score contributions are given in the lecture as
si (, 2 ) =
yi xi
xi
2
2
1 (yi xi )
1
22 + 2 4
Now, the information in observation i is

Ii (, 2 ) = E{si (, 2 )si (, 2 ) }
=i
}|
{
z

)2
yi xi
yi xi
(y
x
i
1
1
i
=E
xi
xi 22 + 2 4
2
2
1 (yi xi )
1
22 + 2 4
2
2i
i
1
1 (yi xi )
xx
x ( 22 + 2 4 )
4 i i
2 i
2 2 2
=E
2
i
( 1 2 + 1 (yi x4i ) ) i2 xi
4
2
2
2

1
x x
0
2 i i
=
4 22 2 + 4
}
0
E{ i 4i 8
|
=
=
{z
3 4 2 4 + 4
4 8
1
2 4

1 1 PN
N
1 X
xi xi 0
2
2 N
i=1
Ii (, ) = lim
lim
1
N
N N
0
2 4
i=1
1

0
2 xx
=
= I(, 2 )
1
0
4
2
Thus, the asymptotic covariance matrix is

2 1
xx 0
,
V =
0
2 4
so that in large samples (since
n( ) N (0, V ))
N
1 X
xi xi
N i=1
1
2
V ar()
N
where
2 may be the MLE of 2 ,
by N K.
1
N
PN
2
i=1 ei ,
!1
=
2
N
X
xi xi
!1
i=1
or the unbiased estimate where one divides
3)
a) Use > Models > Nonlinear models > Logit (Probit) > Binary... We get
log likelihood values of -4080.685 for the Logit model and -4079.950 for the Probit
model. Thus, there is nearly no difference in the goodness-of-fit with the Probit
model being slightly better.
b) Use the option > Show slopes at mean in the model specification window. The
marginal effects are very similar which is a quite general result and not specific to
this exercise. Note that the z statistics (which are the equivalents to the t statistics
5
in a linear model) are also quite similar for both models whereas the coefficients
themselves are scaled differently, i.e. they seem to differ roughly by some constant
factor.
c) We use the Probit model from now on. Go to > File > View as equation in
the model window. Then, start the GRETL console ( > Tools > gretl console)
and copy and paste the equation into the console. Substitute the specific values
of the variables for the variable names, i.e. replace, for instance, age by 40. Now,
define a scalar to calculate the desired probabilities which can be done by typing
scalar xb1 = 0.256 + 0.375 + 0.0179 40 0.193 + 0.474 0.600
for the case (i) and analogously for case (ii) (where the scalar may then be named
xb2). Pressing Return gives us the values of xb1 and xb2 (0.516 and 1.116), but
these values are only of the type xi b and not yet the desired probabilities. To
compute these we have to calculcate (xb1) and (xb2) where () is the standard
normal cumulative distribution function. This can be done in GRETL by typing
scalar p1 = cdf (N, xb1)
and likewise for the second probability. cdf () means that we calculate a cumulative
distribution function (CDF) and the first argument (N) specifices this CDF to be
the standard normal distribution. The resulting probabilities are 0.697073 and
0.867789. The difference, 0.170716, should be interpreted as the marginal effect of
leaving self-employment on the probability to be insured for a healthy and married
man, aged 40, since
(xi b)
(xi b) (here) (xb2) (xb1)
=
.
self empi
self empi
1
Note that the result for this specific case is not far away but different to the slope
at the mean, 0.188240, which was directly reported by GRETL.
d) No. Like in a linear model, we have an endogeneity problem if a regressor and
the dependent variable are simultaneously determined. While a formal definition of
endogeneity in nonlinear models is beyond the scope of this course, it is important
to note that problems like simultaneity and omitted variables are also present in
nonlinear models.
e) age2 is highly significant (p value of 0.0015) and both the Akaike criterion (AIC) and
the Schwarz criterion (Bayesian information criterion, BIC) are lower than before.
Thus, we conclude that age2 belongs into the model. The marginal effect of age is
(xi b)
(here)
= (xi b)(bage + 2 bage2 agei ) = (xi b)(0.0482065 2 0.000389607agei )
agei
f) The variable malemarried is equal to one for married men and zero otherwise. For
married women only the variable married is equal to one in case of marriage whereas
for men both married and male married are equal to one in case of marriage.
6
Thus, for women the coefficent of married (0.382583) reflects the effect of getting
married on the probability to be insured while for men this effect is the sum of the
coefficients of married and male married (0.382583 + 0.134117 = 0.5167). Since
the coefficient of male married is statistically signficant, there are significant
differences between men and women with respect to the effect of marriage on the
probability to be insured.
General information
Unlike in all previous weeks, this week we will be having two lectures and the following
week there will be two exercise meetings. This is due to the fact that the topic of these
two weeks, time series, requires introducing a number of concepts before being able to
treat practically more relevant issues.
This week we start by looking at some basic properties of time-series data. Next, we
extensively treat univariate autoregressive moving average (ARMA) models. This model
class is empirically very useful and popular. We look at many issues related to ARMA
models like representation, parameter restrictions, estimation, model selection and forecasting. Following this, we briey treat the topic of non-stationarity in time series. In
particular, we look at so called unit root processes and study how to test for unit roots.
Reading
Obligatory reading: Verbeek, 8.1, 8.2, 8.4 (but skip the KPSS test and 8.4.3-8.4.4) 8.6,
8.7, 8.8.1, and 8.11
Additional reading:
Wooldridge does not really treat ARMA models, although some of his treatments of
regression models with time-series data may be useful. Unit roots are treated in 11.111.3.
Greene, Chapter 21.1 and 21.2. for ARMA and 22.1-22.2 for unit roots.
A nice treatment of ARMA models is also given in the book Elements of Forecasting
by Francis Diebold.
1)
We know that 25% of all games are sold out, so that
P (Y < 20000) = 0.75
(
)
20000
= 0.75
20000
= 1 (0.75) = 0.675
20000
=
0.675
(1)
Further, by the law of total probability,

E[Y ] = E[Y |Y = 20000] P (Y = 20000) + E[Y |Y < 20000] P (Y < 20000)
= 20000 0.25 + E[Y |Y < 20000] 0.75
= 20000 0.25 + +
20000
(
)
20000
0.75
Inserting the relation (1) and using E[Y]=18000, we have

(
20000
20000
18000 = 20000 0.25 + +

20000
0.675

))
0.75
13000 = ( + (20000 )(0.628)) 0.75

13000 = 1.221 9420
13000 + 9420
= 18362
=
1.221
2)
a) As expected, high leverage has a signicant negative eect on credit ratings while
protability has a signicant positive eect. The other coecients (cut1,. . .,cut6)
are the intercepts. For instance,
P (investment grade rating|xi ) = P (rating 4|xi )
= P (yi > cut3|xi )
= P (i > cut3 xi )
= 1 P (i cut3 xi )
exp(cut3 xi )
=1
1 + exp(cut3 xi )
1
=
1 + exp(cut3 xi )
2
b)
P (rating = 4|booklev = 0.5, ebit = 0.1)
=P (cut3 < yi cut4|booklev = 0.5, ebit = 0.1)
=P (yi cut4|booklev = 0.5, ebit = 0.1) P (yi cut3|booklev = 0.5, ebit = 0.1)
exp(cut4 0.5 1 0.1 2 )
exp(cut3 0.5 1 0.1 2 )
=
1 + exp(cut4 0.5 1 0.1 2 ) 1 + exp(cut3 0.5 1 0.1 2 )

Inserting the estimates (cut3 = 0.407119, cut4 = 1.12352, b1 = 4.80696, b2 =
8.66795) gives a result of 0.1788.
c) You have implicitly calculated this probability already in part b). It is
P (yi > cut3|booklev = 0.5, ebit = 0.1)
= 1 P (yi cut3|booklev = 0.5, ebit = 0.1) = 0.2442
d) The coecients of booklev and ebit are of similar magnitude. The intercept is
equal to 0.612432 and is thus close to cut3 from the Ordered Logit model which
is no coincidence since (according to the Logit model)
exp(constant + xi )
1 + exp(constant + xi )
1
=
1 + exp(constant xi )
1
=
1 + exp(0.612432 + 5.31749booklev 8.12393ebit)
P (invgrade = 1) =
From the rst to the second equation we divided the numerator and the denominator
by exp(constant + xi ). Note the analogy of this equation and the probability for
an investment grade rating from the Ordered Logit model (see a)).
Inserting booklev= 0.5 and ebit= 0.1 gives a probability of 0.2255 which is somewhat lower but quite close to the answer in c).
3)
a) Go to > Models > Nonlinear models > Count data, and pick the appropriate
distribution. The results are the same as in Verbeek, pp. 228.
b) The assumption of equidispersion underlying the Poisson model is clearly rejected
since the z statistic of the alpha coecient is equal to 8.125. (z statistics are
calculated the same way as t statistics but are called dierently because they will
under no assumptions follow a t distribution.)
c)
xi
log E{yi |xi }
=
= log(R&D) = 0.831478
log(R&Di )
log(R&Di )
Thus, the coecient of log(R&D) is an elasticity.
3
d) Denote by xi and the regressors and parameters excluding the USA dummy.
E{yi |U SA = 1, xi }
exp(U SA + xi )
=
= exp(U SA ) = 0.55405
E{yi |U SA = 0, xi }
exp(xi )
Thus, ceteris paribus, rms from the USA are expected to have about 44.6% less
patent applications than European rms (0.55405 1 = 0.446). Note that if we
want to compare US rms with Japanese rms we have to calculate
E{yi |U SA = 1, JP = 0, x
exp(U SA )
exp(U SA + x
i }
i )
=
=
= 0.43055

E{yi |U SA = 0, JP = 1, xi }
exp(JP + xi )
exp(JP )
so that, compared to Japanese rms, US rms are expected to have about 57% less
patents.
e) (i)
V {i |xi } = V {yi xi |xi } = V {yi |xi }
(P oisson)
E{yi |xi } = exp(xi )
It follows that the variance of the error term depends on xi and is consequently
not constant if the Poisson model holds. One gets a similar result for the negative
binomial model.
(ii)
log(R&D)
log E{yi |xi }
log(xi )
=
=
log(R&Di )
log(R&Di )
xi
For a US rm in the computer industry with R&D expenditures of 500, xi =

234.631 + 65.6392 log(500) + 47.3702 56.9641 = 163.697, so that the elasticity
is 65.6392/163.697 = 0.40098. This is quite far away from the result in the negative
binomial model where the elasticity was constant and equal to 0.831478.
f) In the estimation results window go to > Graphs > Fitted, Actual plot >
Actual vs. Fitted. The graph for the linear model shows that the predictions are
rather inappropriate especially for the left hand side of the graph where we have
negative predictions. In contrast, the predictions from the negative binomial model
show a better t to the actual values (you might want to use the zoom option).
Essentially, the graphs tell us that the specication E{yi |xi } = exp(xi ) is more
appropriate than E{yi |xi } = xi . This is also conrmed by the log likelihood of the
negative binomial model which is 819.956 and is thus higher than the log likelihood
of the linear (normal) model (log L = 1110.774). Note that we can compare the
log likelihoods since the dependent variable is the same for both models and the
number of parameters is equal as well.
General information
As announced last week, this week we will have two exercise meetings. The solutions to
the exercises will be provided right away, but I nevertheless recommend that you try and
think about these exercises without looking at the solutions.
Also note that this is the last week of regular classes. Next week will be concerned with a
review of the course, the discussion of the mock exam (Probeklausur), general questions
and probably some further advice concerning the exam.
Exercises (Part I)
1)
Consider the AR(2) model
yt = 1 yt1 + 2 yt2 + t
Assume that the process is weakly stationary and that E{t |yt1 } = E{t |yt2 } = 0.
a) Calculate the rst- and second-order autocorrelations.
b) Calculate E{yt |It1 } and E{yt |It2 }. If yt (but not yt1 ) rises by two units, how
does your expectation change about yt+2 ?
c) Interpret 1 and 2 .
d) Suppose that 1 = 0.6 and 2 = 0.2. Is there negative dependence between yt and
yt2 ?
e) Rewrite your model in terms of yt1 and yt2 and interpret the coecients in the
new model formulation.
2)
Let t be a Gaussian white noise process and consider the moving average process of
order q
yt = t + 1 t1 + . . . + q tq .
Show that the M A(q) process is stationary for any order q and parameters 1 , ..., q .
1
3)
Load the dataset ARMA.gdt. The dataset contains 3 series y1, y2 and y3 that are
assumed to be generated by an ARM A(p, q) process. Determine appropriate choices for
the order p and q, and estimate the corresponding ARMA models.
Exercises (Part II)
1) (adapted from Verbeek)

A researcher uses a sample of 200 quarterly observations on Yt , the number (in 1000s)
of unemployed persons, to model the time series behaviour of the series and to generate
predictions. First, he computes the sample autocorrelation function, with the following
results:
k
k
1
0,83
2
0,71
3
0,60
4
0,45
5
0,44
6
0,35
7
0,29
8
9
0,20 0,11
10
-0,01
a) Does the above pattern indicate that an autoregressive or moving average representation is more appropriate? Why?
Next, the sample partial autocorrelation function is determined. It is given by
1
2
0,83 0,16
3
-0,09
4
0,05
5
0,04
6
-0,05
7
0,01
8
0,10
9
-0,03
10
-0,01
b) Does the above pattern indicate that an autoregressive or moving average representation is more appropriate? Why?
The researcher decides to estimate, as a rst attempt, a rst-order autoregressive model
given by
Yt = + Yt1 + t .
The estimated value (given by OLS) for is 0.83 with a standard error of 0.07.
c) Formulate the hypothesis of a unit root and perform a unit root test based on the
above regression.
d) Perform a test for the null hypothesis that = 0.90.
Next, the researcher extends the model to an AR(2), with the following results (standard
errors in parentheses):
Yt = 50.0 + 0.74 Yt1 + 0.16 Yt2 + t .
(5.67)
(0.07)
(0.07)
e) Would you prefer the AR(2) model to the AR(1) model? How would you check
whether an ARMA(2, 1) model may be more appropriate?
f) What do the above results tell you about the validity of the unit root test in c)?
g) How would you test for the presence of a unit root in the AR(2) model?
h) From the above estimates, compute an estimate for the average number of unemployed E{Yt }.
i) Suppose the last two quarterly unemployement levels for 1996-III and 1996-IV were
550 and 600 respectively. Compute predictions for 1997-I and 1997-II.
j) Can you say anything sensible about the predicted value for the quarter 2023-I?
(and its accuracy?)
2)
Use the dataset SP500.gdt which contains daily returns on the S&P 500 index from
January 1981 to April 1991 (T = 2783). Returns are computed as rst-dierences of the
log of the S&P 500 US stock price index.
a) Estimate an AR(1) model and an AR(2) model by OLS and compare the results.
b) Compute recursive out-of-sample forecasts i) one day ahead and ii) ten days ahead
for the AR(1) model, the AR(2) model, a random walk with drift model (for the
log of the index, i.e. an AR(0) for the returns) and a pure random walk model
using 2000-2783 as out-of-sample observations. Recursive k-step ahead forecasts
for period t use the sample until period t k to make forecasts for period t, here
t = 2000, . . . , 2783. Evaluate your out-of-sample forecasts in terms of the root mean
(
squared error, RM SEk = 1/784
2783
2
b
t=2000 (Yt|tk Yt )
)1/2
. Interpret the results.
c) Compute the share of the variance of out-of-sample S&P returns that you are able
to explain with your best model.
1)
Work through 8.1, 8.2 and 8.4 of the True/False questions on http://www.econ.kulueven.be/gme/.
Solution to this weeks exercises (Part I)
1)
a)
1 =
cov(yt , yt1 )
V (yt )
cov(yt , yt1 ) = cov(1 yt1 + 2 yt2 + t , yt1 )

= 1 V (yt1 ) + 2 cov(yt2 , yt1 )
1 V (yt1 )
cov(yt , yt1 ) =
(cov(yt2 , yt1 ) = cov(yt , yt1 ))
1 2
1
1 =
(V (yt ) = V (yt1 ))
1 2
2 =
cov(yt , yt2 )
V (yt )
cov(yt , yt2 ) = cov(1 yt1 + 2 yt2 + t , yt2 )

= 1 cov(yt1 , yt2 ) + 2 V (yt2 )
2 V (yt1 )
= 1
+ 2 V (yt2 )
1 2
12
2 =
+ 2
1 2
b)
E{yt |It1 } = E{1 yt1 + 2 yt2 + t |It1 }
= 1 yt1 + 2 yt2
E{yt |It2 } = E{1 yt1 + 2 yt2 + t |It2 }
= 1 E{yt1 |yt2 , yt3 } + 2 yt2
= 1 (1 yt2 + 2 yt3 ) + 2 yt2
= (12 + 2 )yt2 + 1 2 yt3
If yt rises by one unit holding yt1 constant, the expected value of yt+2 rises by
12 + 2 .
c) 1 is the eect of a one unit increase in yt1 on yt , holding yt2 constant. 2 is the
eect of a one unit increase in yt2 on yt , holding yt1 constant. Note that increasing
yt2 while holding yt1 xed means that the dierence yt1 yt2 decreases.
4
d) No. The second-order partial autocorrelation coecient is negative (equal to 0.2)

0.62
but the second-order autocorrelation is 2 = 1+0.2
0.2 = 0.1 so that the dependence
(as measured by the correlation) is positive.
e)
yt = 1 yt1 + 2 yt2 + t
= 1 (yt1 yt2 ) + (1 + 2 )yt2 + t
= 1 yt1 + 2 yt2 + t
= 0.6yt1 + 0.4yt2 + t
1 = 1 , so that the interpretation is the same as in c). Note that increasing yt1
and increasing yt1 is the same if yt2 is held constant. 2 = 1 + 2 since 2 is the
eect of a one unit increase in yt2 holding yt1 constant which means that yt2
and yt1 are increased by one unit. Consequently, the eect of increasing yt1 and
yt2 by one unit each in the original model formulation, 1 + 2 , is equal to 2 .
2)
For the M A(p) process yt to be stationary it has to hold that for all t that
i) the mean is the same,
ii) the variance is the same,
iii) the autocovariance (and autocorrelation) only depends on the time dierence.
iid
To show these properties use that t is a Gaussian white noise process, thus t N (0, 2 ).
i)
E(yt ) = E(t + 1 t1 + . . . + q tq )
= E(t ) + 1 E(t1 ) + . . . + q E(tq )
=0
Thus the mean is equal for all t.
ii)
(
var(yt ) = var t +
= var(t ) +
(
1+
i=1
q
i=1
q
i ti
i2 var(ti ) + 0
)
i2
i=1
Thus the variance is equal for all t.

5
(since cov(k , l ) = 0 for k = l)
iii) Let k > 0 be the time dierence. It holds that

cov(yt , ytk )
= E(yt ytk ) E(yt )E(ytk )
= E ((t + 1 t1 + . . . + q tq ) (tk + 1 tk1 + . . . + q tkq )) 0 0
) q
q
=E
i ti j tkj ,
i=0
j=0
with 0 = 1.
If you multiply out the terms in parentheses you get terms of the form i ti j tkj .
When t i = t k j i = k + j, it follows that
E(i ti j tkj ) = i ik E(2ti ) = i ik 2 ,
otherwise
E(i ti j tkj ) = i j E(ti )E(tkj ) = 0 ,
since then ti , tkj are independent and E(k ) = 0 for all k. From all this it
follows that
cov(yt , ytk ) = 2 (0 k + 1 k+1 + . . . + q k+q )
Thus the covariance depends on k, but not on t. Note that cov(yt , ytk ) = 0 for
k > q since then k = k+1 = . . . = k+q = 0.
3)
A rst step is to look at the autocorrelations and partial autocorrelations which can be
done via Variable Correlogram in gretl. To estimate the models go to Model
Time Series ARIMA where you can specify the AR and MA lags and choose
between conditional and full Maximum Likelihood. Note that in the estimation results
gretl denotes the AR coecients by Phi and the MA coecients by Theta. After you
have tted a model you can check the residuals for any remaing autocorrelation by going
to Graphs Residual correlogram.
Solution to this weeks exercises (Part II)
1)
a) The sample autocorrelation function decays only slowly so that a pure MA model
(at least of some low order) seems to be inappropriate since for an MA(q) model
the autocorrelations of order q + 1, . . . are equal to zero.
b) The sample partial autocorrelation function is very close to zero for order three or
more. This points to an AR(1) or AR(2) model since these models imply that the
partial autocorrelations of order two ore more or three or more, respectively, are
equal to zero. In contrast, for an MA(q) model the partial autocorrelation function
is not equal to zero even for order q + 1, . . ..
6
c) Rewriting the model gives the Dickey-Fuller test equation:

Yt = + ( 1)Yt1 + t
b =
The estimate for ( 1) is 0.83 1 = 0.17 with a standard error of se()
b
se( 1) = 0.07 giving a Dickey-Fuller test statistic of 0.17/0.07 = 2.4286.
Looking at Table 8.1 in Verbeek we see that we cannot reject the null hypothesis
of a unit root at a 5% signicance level since 2.4286 > 2.88.
d) For testing = 0.9 we can stick to the classical t test since now the null hypothesis
implies stationarity in contrast to the unit root test. Given a stationary process, the
assumptions behind the t test (can) hold (heteroscedasticity and/or autocorrelation
might still be an issue) while for a nonstationary process the classical t test is
generally invalid. The t statistic is here (0.83 0.9)/0.07 = 1 which does not
allow rejection of H0 at conventional signicance levels.
e) The t statistic of the AR(2) coecient 0.16/0.07 = 2.286 indicates that Yt2
should be included into the regression. We can compare the AR(2) model with
a ARMA(2,1) model by checking if the additional MA coecient would be signicant and/or looking at the information criteria AIC/BIC.
f) If Yt2 belongs into our model (i.e. it has a non-zero true coecient) the unit root
test as conducted in part c) is invalid since there is autocorrelation in the errors due
to the omission of Yt2 and we implicitly assumed that the errors are white noise.
g) Now we have to use the augmented Dickey-Fuller test with the test equation
Yt = + (1 + 2 1)Yt1 2 Yt1 + t
The test statistic is (c1 + c2 1)/se(c1 + c2 ) which can be compared with the same
critical values as in c). With the information given here, we cannot compute the test
d 1 , 2 ) which is necessary for the denominator.
statistic since we do not have cov(
h) We have
E{Yt } = E{ + 1 Yt1 + 2 Yt2 + t }
= + 1 E{Yt1 } + 2 E{Yt2 }
E{Yt } =
1 1 2
The latter equality follows from E{Yt =}E{Yt1 } = E{Yt2 } which holds if we have
stationarity. In the presence of a unit root (which is not unrealistic here, see c))
there is no obvious way to estimate E{Yt }. Assuming stationarity and plugging in
50
b
our parameter estimates we have E{Y
t } = 10.740.16 = 500.
i) Very similar to part I of this weeks exercises, question 1 b), but now with an
intercept we have E{Yt + 1|Yt , Yt1 } = + 1 Yt + 2 Yt1 and
E{Yt+2 |Yt , Yt1 } = E{ + 1 Yt+1 + 2 Yt + t |Yt , Yt1 }
= + 1 E{Yt+1 |Yt , Yt1 } + 2 Yt
= + 1 ( + 1 Yt + 2 Yt1 ) + 2 Yt
= (1 + 1 ) + (12 + 2 )Yt + 1 2 Yt1
Note that these conditional expectations are the optimal predictors in a mean
squared error sense. Inserting the parameter estimates and the observations for
1996-II and 1996-IV our predictors are
b
E{Y
1997I |Y1996IV , Y1996III } = 50 + 0.74 600 + 0.16 550 = 582
and
2
b
E{Y
1997II |Y1996IV , Y1996III } = 50(1 + 0.74) + (0.74 + 0.16)600 + 0.74 0.16 550
= 576.68
j) For the prediction of Y2023I the information from 1996 will be almost useless.
The prediction will be thus very close to the unconditional expectation E{Yt } (see
h)). The accuracy of this prediction will of course be considerably lower than the
accuracy of the predictions in part i).
1)
a) For the AR(1) model, the coecient of Yt1 is 0.0539 with a t ratio of 2.848. For the
AR(2) model, the coecient of Yt1 is 0.566 (t ratio of 2.989) and the coecient of
Yt2 is -0.0487 (t ratio of -2.569). Comparing both models, the signicance of Yt2
in the AR(2) model indicates that Yt2 belongs into the model. Note that we should
not compare the Akaike information criterion and the Schwarz criterion unless the
sample size is exactly the same (the AR(2) model drops the rst two observations
whereas the AR(1) model drops only the rst observation). Running both models
for the sample 3-2783, we get lower AIC for the AR(2) model (-17270.45 vs. 17265.85) while the SC is lower for the AR(1) model (-17253.98 vs. -17252.65).
Thus, the information criteria do not show a clear rank order of the two models in
terms of their likely predictive accuracy.
b) To perform recursive out-of-sample forecasts in gretl, go to > Analysis >
Forecasts in the model window, add no observations, specify the forecast range
(2000-2783), mark > rolling k-step ahead forecast and choose k. gretl than gives
a forecast graph and a forecast table which contains forecast evaluation statistics
like the RMSE at the bottom. This procedure can be used for the AR(0), the AR(1)
and the AR(2) model. For the pure random walk, the forecasted return is always
zero, so that the corresponding RMSE is simply the root mean squared return in
2000-2783. This can be computed by restricting the sample range to 2000-2783
8
( > Sample > Set range) and looking at the summary statistics of the returns
( > Variable > Summary statistics) which contain the mean and the standard
deviation. From this, we can use the standard formula for the sample variance
(
)1/2
2
to get 1/784 2783
= (0.00908202 + 0.000509602 )1/2 = 0.0090963. The
t=2000 Yt
following table gives an overview over the results:
RM SE1
RM SE10
Random Walk
AR(0)
0.0090963
0.0090788
0.0090963
0.0090595
AR(1)
0.0090812
0.0090783
AR(2)
0.0090824
0.0090784
Interestingly, the random walk with drift performs best although the dierences
are rather small. We see in this example that the inclusion of variables that seem
to be important in-sample does not necessarily improve predictive accuracy out-ofsample.
c) The share of the variance of the return series explained by the out-of-sample predictions is given by the R2 in an OLS regression of Yt on Ybt|tk . For this, click on the +
sign in the gretl:forecast window to add the time series of out-of-sample predictions
to the dataset. Running the aforementioned OLS regressions for the AR(0) model
results in an R2 of 0.004863 (k=10) and 0.005135 (k=1), respectively. Thus the
share of the variance explained by the AR(0) model out-of-sample is about 0.5%
which is very low and shows that the forecastability of stock returns is minimal
with the methods used here (and arguably with other methods as well).

Econometrics WS11-12 Course Manual

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics WS11-12 Course Manual

Uploaded by

Copyright:

Available Formats

Course manual Econometrics: Week 1

About the course manual

General information about the course

As mentioned above, prior knowledge in mathematics and statistics is absolutely crucial.

feel free to contact the professor.

The textbook and reading

Studying for the course

This weeks reading

Obligatory reading: Appendix A and B, as well as Chapter 1 of Verbeek

2) Let A and B be n n matrices with full rank. Calculate

Course manual Econometrics: Week 2

Assume that it holds that i N (0, 2 ) for i = 1, . . . , N , where 1 , . . . , n are independent.

5 Solution to last weeks exercises

The matrices B and C are not defined.

A)1 = (B + A)(B + A)1 = I

6 Solution to last weeks additional exercises

Indeed, it holds that

cov{aX1 , bX2 } =E{aX1 bX2 } E{aX1 }E{bX2 }

E{(X E{X})(X E{X}) }

Course manual Econometrics: Week 3

a) Explain how one can test the hypothesis that 3 = 1 .

5 Solution to last weeks additional exercises

Calculating the OLS estimator (X X)1 X y yields b1 = 5 and b2 = 1. (Note that

b) For the standard error of regression we have to calculate the residuals

In our case we have

Thus, the estimator for 2 is

so that the standard error of regression is given by

Course manual Econometrics: Week 4

a) Show that 2 can be interpreted as elasticity of yi with respect to xi .

c) Consider now a third model:

Interpret the coecient 2 .

Interpret the coecient of Female.

Interpret the coecient of log(MarketValue). Further, explain why the coecient

1) (Adapted from Verbeek)

Solutions to last weeks exercises

Given that V { N (b )} = V { N b} A and using that A is a consistent estimator

(Rb q) (R(X X)1 R )1 (Rb q)

Coecient Std. Error

Mean dependent var 0.000323 S.D. dependent var

The correlation between x2 and x3 is quite large (0.88), indicating multicollinearity.

Solution to last weeks additional exercises

The corresponding Wald statistic is (see again lecture 3)

2J . Using the denition of the F distribution (see

lecture 1) we can write F as

Course manual Econometrics: Week 5

This week we treat the problem of heteroscedasticity, so a violation of the assumption

Obligatory reading: Verbeek, Sections 4.1-4.5

a) Explain how an appropriate Generalized Least Squares estimator can be constructed.

Solution to last weeks exercises

Hence the elasticity can be interpreted as the approximate percentage change of y

b) In case of model (2) the elasticity corresponds to

g) For females with a Bachelor degree, the regression relation is

Solution to last weeks additional exercises

Course manual Econometrics: Week 6

Obligatory reading: Verbeek, Sections 4.6-4.11

Solution to last weeks exercises

Applying the transformation matrix P to y and X gives the transformed model

The GLS estimator can be written as

if our assumption is correct, 1 = 0 and 2 = 0 which can be tested seperately by

Thus hi = 1 for i = 1, . . . , N1 and hi =

The GLS estimator is

b) The covariance matrix of can be estimated as

Model 3: OLS, using observations 19275

Model 4: WLS, using observations 19275