You are on page 1of 11

Econometrics - Questions and selected answers

Juergen Bracht (Ph.D. Economics, Pittsburgh, U.S.A.)


24 February 2009

Abstract

Tutorial 1 Problems

Problem 1) Suppose that you are asked to conduct a study to determine whether smaller class
sizes improve performance on standardized tests of fourth graders in Scotland. (a) If you could conduct
any experiment you want, what would do? Be specific. (b) More realistically, suppose you can collect
observational data on several thousand fourth graders. You can obtain the size of their fourth-grade
class and a standardized test score taken at the end of fourth grade. Why might you expect a negative
correlation between class size and test score? (c) Would a negative correlation necessarily show that
smaller class sizes cause better performance? Explain.

Problem 2) Suppose a secondary-school student is preparing to take an university-entrance exam.


Explain why her eventual score is properly viewed as a random variable.

Problem 3) Let X be a random variable distributed as Normal(5,4). Find the probabilities of the
following events:

a) P (X <= 6).

b) P (X > 4).

c) P (|X − 5| > 1).

1
Tutorial 2 Problems

Problem 1) Let Y1 , Y2 , Y3 , Y4 independent, identically distributed random variables from a popula-


tion with mean μ and variance σ 2 . Let Y = 14 (Y1 + Y2 + Y3 + Y4 ) denote the average of these four random
variables.

a) What are the expected value and variance of Y in terms of μ and variance σ 2 ?

b) Now, consider a different estimator of μ: W = 18 Y1 + 18 Y2 + 14 Y3 + 12 Y4 . This is an example of a


weighted average of the Yi . Show that W is also an unbiased estimator of μ. Find the variance of W .

c) Based on the answer to parts (a) and (b) which estimator do you prefer, Y or W ?

Problem 2) This is a more general version of Problem 1). Let Y1 , Y2 , ..., Yn be n pairwise uncorrelated
random variables with common mean μ and common variance σ 2 . Let Y denote the sample average.

a) Define the class of linear estimators of μ by Wa = a1 Y1 + a2 Y2 + ... + an Yn where the ai are


constants. What restriction on the ai is needed for Wa to be an unbiased estimator of μ?

b) Find V ar(Wa ).

c) For any numbers a1 , a2 , ..., an , the following inequalities holds: (a1 + a2 + ... + an )2 /n <= a21 +
a22 + ... + a2n . Use this along with parts (a) and (b) to show that V ar(Wa ) >= V ar(Y ) whenever Wa is
unbiased so that Y is the best linear unbiased estimator. Hint: What does the inequality become
when the ai satisfy the restriction from part (a)?

2
Tutorial 3 Problem (Difficult)

Consider the standard simple regression model y = β 0 + β 1 x + u under Gauss-Markov assumptions.


The usual OLS estimators β b0 and β
b1 are unbiased for their respective population parameters. Let β
e1 be
the estimator of β 1 obtained by assuming the intercept is zero.

a) Find E(β e1 is unbiased for β 1 when the population


e1 ) in terms of xi , β 0 and β 1 . Verify that β
intercept (β 0 ) is zero. Are there other cases when β e1 is unbiased?

e1 (Hint: The variance does not depend on β 0 ).


b) Find the variance of β

e1 ) <= V AR(β
b1 ). Hint: For any sample of data, P P
c) Show that the V ar(β x2i >= (xi − x)2 , with
strict inequality when unless x.
e1 and β
d) Comment on the trade-off between bias and variance when choosing between β b1 .

3
Tutorial 4 (Computer Problem)

Use the data in SLEEP75.wf1 from Biddle and Hamermesh (1990), Sleep and the Allocation of Time,
Journal of Political Economy 98, 922-943

1) We study whether there is a trade-off between time spent sleeping per week and the time spent
in paid work. We could use either variable as the dependent variable. For concreteness, estimate the
model sleep = β 0 + β 1 totwork + u where sleep is minutes spent sleeping at night per week and totwork
is total minutes worked during the week. (1a) Report your results in equation form along the number
of observations and R2 . What does the intercept in this equation mean? (1b) If totwork increases by 2
hours, by how much is sleep estimated to fall? Do you find this to be a large effect?

2) The following model is a simplified version of the multiple regression model used in Biddle and
Hamermesh (1990) to study the trade-off between time spent sleeping and working and to look at other
factors affecting sleep: sleep = β 0 +β 1 totwork+β 2 educ+β 3 age+u where sleep and totwork are measured
in minutes per week and educ and age are measured in years. (2a) If adults trade off sleep for work,
what is the sign of β 1 ? (2b) What signs do you think β 2 and β 3 will have? (2c) Using the data, the
[ = 3638.25 − 0.148totwork − 11.13educ + 2.20age where n = 706, R2 = 0.113.
estimated equation is sleep
If someone works five more hours per week, by how many minutes is sleep predicted to fall? Is this a
large trade-off? (2d) Discuss the sign and magnitude of the estimated coefficient on educ. (2e) Would
you say totwork, educ and age explain much of the variation in sleep? What other factors might affect
the time spent sleeping? Are these likely to be correlated with totwork?

[ = 3638.25 − 0.148totwork −
3) We now report the standard errors along with the estimates: sleep
(112.28) (0.017)
11.13educ + 2.20age. (3a) Is either educ or age individually significant at the 5% level against a two-sided
(5.88) (1.45)
alternative? (3b) Drop educ and age from the equation. Are educ and age jointly significant in the
original equation at the 5% level? Justify your answer. (3c) Does including educ and age in the model
greatly affect the estimated trade-off between sleeping and working? (3d) Suppose that the sleep equation
contains heteroskedasticity. What does this mean about the tests computed in parts (3a) and (3b)?

4
Tutorial 1 Solutions

Solution 1)

a) Ideally, we could randomly assign students to classes of different sizes. That is, each student is
assigned a different class size without regard to any student characteristics such as ability and family
background. We also would like substantial variation in class sizes.
b) A negative correlation means that larger class size is associated with lower performance. We
might find a negative correlation because larger class size actually hurts performance. However, with
observational data, there are other reasons we might find a negative relationship. For example, children
from more affluent families might be more likely to attend schools with smaller class sizes, and affluent
children generally score better on standardized tests. Another possibility is that, within a school, a head
teacher might assign the better students to smaller classes.
c) Given the potential for confounding factors — some of which are listed in (b) — finding a negative
correlation would not be strong evidence that smaller class sizes actually lead to better performance. Some
way of controlling for the confounding factors is needed, and this is the subject of multiple regression
analysis.

Solution 2)

Before the student takes the exam, we do not know — nor can we predict with certainty — what the
score will be. The actual score depends on numerous factors, many of which we, as observers, cannot
even list, let alone know ahead of time. (The student’s innate ability, how the student feels on exam day,
and which particular questions were asked, are just a few.) The eventual exam score clearly satisfies the
requirements of a random variable.

Solution 3)

a) P (X < 6) = P [ X−5
2
< 6−5
2
] = P (Z < 0.5) ≈ 0.6915, where Z denotes a Normal(0, 1) random
variable.
(4−5)
b) P (X > 4) = P [ X−5
2
> 2
] = P (Z > −0.5) = P (Z < 0.5) ≈ 0.6915.

c) P (|X − 5| > 1) = P (X − 5 > 1) + P (X − 5 < −1)


= P (X > 6) + P (X < 4) ≈ (1 − 0.6915) + (1 − 0.6915) = 0.617.

We used answers from parts (a) and (b).

5
Tutorial 2 Solutions

Solution 1)

a) This is a special case of what is covered in the text, with n = 4: E(Y ) = μ and V ar(Y ) = σ 2 /4.

b) E(W ) = E(Y1 )/8 + E(Y2 )/8 + E(Y3 )/4 + E(Y4 )/2


= μ[(1/8) + (1/8) + (1/4) + (1/2)] = μ(1 + 1 + 2 + 4)/8 = μ, which shows that W is unbiased.

Because the Yi are independent, V ar(W ) = V ar(Y1 )/64 + V ar(Y2 )/64 + V ar(Y3 )/16 + V ar(Y4 )/4
= σ 2 [(1/64) + (1/64) + (4/64) + (16/64)] = σ2 (22/64) = σ 2 (11/32).

c) Because 11/32 > 8/32 = 1/4, V ar(W ) > V ar(Y ) for any σ 2 > 0, so Y is preferred to W because
each is unbiased.

Solution 2)

a) E(Wa ) = a1 E(Y1 ) + a2 E(Y2 ) + ... + an E(Yn ) = (a1 + a2 + ... + an )μ. Therefore, we must have
a1 + a2 + ... + an = 1 for unbiasedness.

b) V ar(Wa ) = a21 V ar(Y1 ) + a22 V ar(Y2 ) + ... + ann V ar(Yn ) = (a21 + a22 + ... + ann )σ 2 .

c) From the hint, when a1 + a2 + +an = 1 — the condition needed for unbiasedness of Wa — we have
1/n <= a21 + a22 + ... + a2n . But then V ar(Y ) = σ 2 /n <= σ 2 (a21 + a22 + ... + ann ) = V ar(Wa ).

6
Tutorial 3 Solutions
S
e1 =
a) Textbook Equation 2.66: β Sxi y i
.
x2i
S
e1 =
Plugging in yi = β 0 + β 1 xi + ui gives β xi (β 0 +β 1 xi +ui )
S 2 .
xi

P P P
The numerator can be written as β 0 xi + β 1 x2i + xi ui .

Plug in:
S S
e1 =
β β0
S 2i
x
+ β1 + xu
S i2i .
xi xi
S
e1 ) = β 0 S x2i + β 1 because E(ui ) = 0 for all i.
Conditional on the xi , we have E(β x i

Therefore, the e
P bias in β 1 is given by the first term in the equation. Bias is zero when β 0 = 0. It is
also zero when xi = 0 (hence x = 0). In the latter case, regression through the origin is identical to
regression with an intercept.
e1 we have, conditional on the xi ,
b) From the last expression for β
³X ´−2 P ³X ´−2 ³X ´
e1 ) =
V ar(β x2i V ar ( xi ui ) = x2i x2i V ar(ui )
³X ´−2 ³ X ´
σ2
= x2i σ2 x2i = X .
x2i

X X
b1 ) = X σ2
c) From (2.57), V ar(β . From the hint, x2i >= e1 ) <= V ar(β
(xi −x)2 so V ar(β b1 ).
(xi −x)2

d) For fixed n, the bias of βe1 increases as x increases (holding the sum of the x2 fixed). But as x
³ ´ i
b e e
increases, the variance of β 1 increases relative to V ar β 1 . Then bias in β 1 is also small when β 0 is
small. Therefore, whether we preferX e1 and β
β b1 on a mean squared error basis depends on the sizes of β 0 ,
x and n (in addition to the size of x2i ).

7
Tutorial 4 Solutions
[ = 3586.4 − 0.151totwork with n = 706, R2 = 0.103. The
(1a) The estimated equation is sleep
intercept implies that the estimated amount of sleep per week for someone who does not work is 3586.4
minutes or about 59.44 hours per week or about 8.5 hours per night.
[ = −0.151∗120 =
(1b) If someone works two more hours per week then 4totwork = 120 and so 4sleep
−18.12 (minutes). This is only a few minutes a night.

(2a) If adults trade off sleep for work, more work implies less sleep (other things equal), so β 1 < 0.
(2b) The signs of β 2 and β 3 are not obvious.
(2c) 4totwork = 0.148 ∗ 300 = 44.4 (minutes). For a week, this is not a overwhelming change.
(2d) If we assume the difference between college and high school is four years, the college graduate
sleeps about 11.13 ∗ 4 = 44. 52 (minutes) less per week. The effect is quite small.
(2e) Not surprisingly, the three explanatory variables explain only about 11.3% of the variation in
sleep. One important factor in the error term is general health. Another is marital status, and whether
the person has children. Health, for example, would be correlated with totwork.

(3a) df = 706 − 4 = 702. The standard critical value (df = ∞ ) is 1.96 for a two-tailed test at
5% level. Now teduc = −11.13
5.88
= −1.8929. We fail to reject the null hypothesis at the 5% level. Also,
2.20
tage = 1.45 = 1.5172. Age is also statistically insignificant at the 5% level.
(3b) We could to compute the R2 -form of the F statistic for joint significance. F = 0.113−0.103
1−0.113
702
2
=
3.9572. The 5% critical value is the F2,702 distribution can be obtained with a denominator df = ∞:
3.00. Therefore, educ and age are jointly significant at the 5% level. (In fact, the p value is about 0.019,
and so educ and age are jointly significant at the 2% level).
(3c) Not really. These variables are jointly significant, but including them only changes the coefficient
on totwork from −0.151 to −0.148.
(3d) The t and F statistics that we used assume homoskedasticity. If there is heteroskedasticity in
the equation, the tests are no longer valid.

8
Exam #1 Econometrics

1) A justification for job training programs is that they improve worker productivity. Suppose that you
are asked to evaluate whether more job training makes workers more productive. However, rather than
having data on individual workers, you have access to manufacturing firms in Scotland. In particular, for
each firm, you have information on hours of job training per worker (training) and number of nondefective
items produced per worker hour (output).
1a) Carefully state the ceteris paribus thought experiment underlying this policy question (5 marks).
1b) Does it seem likely that a firm’s decision to train its workers will be independent of worker
characteristics? What are some of those measurable and unmeasurable worker characteristics? (5 marks)
1c) Name a factor other than worker characteristics that can affect worker productivity. (3 marks)
1d) If you find a positive correlation between output and training, would you have convincingly
established that job training makes workers more productive? Explain. (7 marks)

2) Briefly explain these terms: Experiment, Binary Random Variable, Normal Distribution, Standard
Normal Distribution, Cumulative Distribution Function, Random Sample, Asymptotic Normality, Central
Limit Theorem, Sampling Distribution, Rejection Region, p value, Sample Average, Sample Correlation
Coefficient, Sample Standard Deviation, Sample Variance, Sampling Variance. (20 marks)

3a) Let Y1 , Y2 , ..., Yn be n pairwise linear uncorrelated random variables with common mean μ and
common variance σ 2 . Let Y denote the sample average. Show that Y is an unbiased estimator of the
population mean μ. Verify that V ar(Y ) = σ 2 /n. (10 marks)
3b) Why has "unbiasedness" appeal as a property for an estimator? (5 marks)
3c) What are weaknesses of "unbiasedness" as a property for an estimator? (5 marks)

4) A researcher investigates what factors affect chief executives officer salaries. Her data set contains
information on 177 chief executives for U.S. corporations from 1990. The variable salary is annual
compensation, in thousands of dollars; the variable ceoten is prior number of years as company CEO;
the variable comten is years with company; the variable sales is firm sales, in millions; the variable val
is market value, in millions; the variable marg is profits as % of sales.

1
Dependent Variable: log(salary)
Method: Least Squares
Included observations: 177
Model 1
log(salary) = β 0 + β 1 log(sales)
Variable Coefficient Std. Error t-Statistic Prob.
C 4.961077 0.199960 24.81039 0.0000
LOG(SALES) 0.224279 0.027129 8.267132 0.0000
R-squared 0.280858
Model 2
log(salary) = β 0 + β 1 log(sales) + β 2 log(val) + β 3 marg
Variable Coefficient Std. Error t-Statistic Prob.
C 4.620690 0.254344 18.16709 0.0000
LOG(SALES) 0.158483 0.039814 3.980590 0.0001
LOG(VAL) 0.112261 0.050393 2.227701 0.0272
MARG -0.002259 0.002165 -1.043124 0.2983
R-squared 0.303494
Model 3
log(salary) = β 0 + β 1 log(sales) + β 2 log(val) + β 3 marg + β 4 ceoten + β 5 comten
Variable Coefficient Std. Error t-Statistic Prob.
C 4.571977 0.253466 18.03781 0.0000
LOG(SALES) 0.187787 0.040003 4.694340 0.0000
LOG(VAL) 0.099872 0.049214 2.029345 0.0440
MARG -0.002211 0.002105 -1.050132 0.2951
CEOTEN 0.017104 0.005540 3.087309 0.0024
COMTEN -0.009238 0.003337 -2.767983 0.0063
R-squared 0.352537

4a) Comment on the effect of marg on CEO salary. Would you include marg in a final model
explaining CEO compensation in terms of firm performance? Explain. (4 marks)
4b) Does market value have a significant effect? Explain. (4 marks)
4c) Interpret the coefficients on ceoten and comten. Are these explanatory variables statistically
significant? What do the estimates imply? (4 marks)
4d) What do you make of the fact that longer tenure with the company, holding the other factors
fixed, is associated with lower salary. (4 marks)
4e) What is the parameter β 1 ? What do the estimates mean? (4 marks)

2
Selected answers - Exam #1 Econometrics

1a) One way to pose the question: If two firms, say A and B, are identical in all respects except that
firm A supplies job training one hour per worker more than firm B, by how much would firm A’s output
differ from firm B’s?
1b) Firms are likely to choose job training depending on the characteristics of workers. Some observed
characteristics are years of schooling, years in the workforce and experience in a particular job. Firms
might even discriminate based on age, gender or race. Perhaps firms choose to offer training to more or
less able workers, where “ability” might be difficult to quantify but where a manager has some idea about
the relative abilities of different employees. Moreover, different kinds of workers might be attracted to
firms that offer more job training on average, and this might not be evident to employers.
1c) The amount of capital and technology available to workers would also affect output. So, two firms
with exactly the same kinds of employees would generally have different outputs if they use different
amounts of capital or technology. The quality of managers would also have an effect.
1d) No, unless the amount of training is randomly assigned. The many factors listed in parts (b) and
(c) can contribute to finding a positive correlation between output and training even if job training does
not improve worker productivity.
¡ P ¢ 1 P P P
3a) E(Y ) = E n1 Yi = n E ( Yi ) = n1 E(Yi ) = n1 μ = n1 nμ = μ.

4a) In model 2 and 3, the coefficient on marg is negative, although its t statistics is only about −1.
It appears that, once firm sales and market value have been controlled for, profit margin has no effect on
CEO salary.
4b) Model 3 controls for the most factors affecting salary. The t statistics on log(val) is about 2.05.
The standard critical value is 1.96. So log(val) is just significant at the 5% level against a two-sided
alternative. Because the coefficient β 2 is an elasticity, a ceteris paribus increase in market value is
predicted to increase salary by 1%.
4c-d) These variables are individually significant at a low significance level. Another year with the
company, but not as a CEO, lowers salary be 0.92%. This finding at first seems surprising but could be
related to the superstar effect: firms hire CEOs from outside the company often go after a small pool
of highly regarded candidates and salaries of these people are bid up. More non-CEOs years with the
company makes it less likely the person was hired as an outside manager. Related case: Regression of
log(wage) on experience and tenure.
4e) β 1 is an elasticity. 1% increase in sales, 0.19% in salary.

You might also like