DEPARTMENT OF ECONOMICS
Exam: ECON4135  Applied statistics and econometrics, fall 2004
Date of exam: Wednesday, December 1, 2004
Time for exam: 14:30 17:30
The problem set covers 6 pages
Resources allowed:
All written and printed resources, as well as calculators, are allowed
Grades given: A (best), B, C, D, E and F, with E as the weakest passing grade.
Problems
1. Female life expectancy in best practicing country is plotted against year in Figure 1. A
linear regression model Yyear 0 1 year U year is fitted to the data by ordinary least
squares, see Stata output in Exhibit 1. What is the interpretation of ? Is the intercept
1
estimate directly meaningful? Give a 95% confidence interval for the gain in life
expectancy in one calendar year, and also in 10 calendar years. What is the 99%
confidence interval for yearly gain in expected life length?
2
95% CI for 1 : (0.238, 0.248); 95% CI for 10 1 is 10 times that for 1 , (2.38,
2.48); 99% CI for 1 : (0.237, 0.249).
2. From Figure 1 there was clearly more variation in the data in the first part of the period
than in the remaining period, and there were outlying observations in the period 19161919. What could have caused these patterns? Another pattern is that record life
expectancy is flat over several periods around 1900. Why could that be? The regression
results in Exhibit 1 were calculated with robust standard errors. Why is it a good idea to
calculate robust standard errors in this case?
Few countries gathered statistical data on death rates in the early part of the
period, and those who did produced vital statistics more prone to error and
variability than in the 20th Century. 19161919: war and Spanish disease. Flats
over periods: Around 1900, several countries, including Norway, published vital
statistics every fifth year. With the best practicing country in this group, the
series has flats between publication years.
The standard errors for regression coefficients are biased if computed with the
classical method rather than the robust method when there is
heteroscedasticity such as the observed.
3. For a given year from 1841 on, the first difference in record life expectancy is
Dyear Yyear Yyear 1 . Figure 2 shows first differences record life expectancy versus year,
and Exhibit 2 gives summary information for this variable. Comment briefly on Figure 2
in view of Figure 1, and explain how the regression result relates to the mean in Exhibit 2.
Figure 2 shows more variability early in the period (due to more variation
around the regression curve), several flats at zero around 1900 (due to flats in
Figure 1), and variation around a constant level slightly above zero (due to
linearity in Figure 1). The mean 0.243 in Exhibit 2 estimates the yearly gain in
record life expectancy, which is modelled as 1 . It agrees with the regression
estimate of 0.243. The standard error obtained from Exhibit 2, 1.097 / 160 is
0.087, which does not match the standard error in Exhibit 1 (0.0024). The
discrepancy is due to the former being based on all differences having the
same variance, which certainly is not the case, while the latter is calculated by
the robust method not relying on this assumption.
4. From 1946 on D appears to have a rather stable development. An autoregressive model
of order 1 was estimated for this period. What could a rationale be for this model? The
Stata result (in condensed form) is given in Exhibit 3, which means that the estimated
model is D year .257.135 Dyear 1 . What is the standard error for the autoregressive
coefficient? Is there a significant first order auto regression in the first differences in
record life expectancy?
The standard error for the autoregressive coefficient is 0.148. With a twosided pvalue of 0.36 (from the exhibit) when testing for no autocorrelation,
the autoregression coefficient is certainly not statistically different from zero.
3
The first differences in record life expectancy might thus very well be
uncorrelated.
5. It is puzzling that record life expectancy has been growing nearly linearly over such a long
period, and indeed seems to continue to grow at about the same pace. Figure 3 is taken
from Oeppen and Vaupel (2000). In the first half of the period, only a few countries had
life expectancy close to record life expectancy (or, in fact, adequate vital statistics). In
more recent years, more and more countries, such as Chile, are getting their vital statistics
in shape, and are catching up with the leading group. That the group of nations with nearly
record life expectancy is growing in number is due to economic and other development in
many nations. Discuss whether the continued growth in record life expectancy could
partly be a statistical consequence of the fact that more and more countries belong to the
group of leading nations regarding female life expectancy.
In a hypothetical situation with life conditions (underlying mortality) not
changing in the group of best practicing countries, but with new countries
joining this group, estimated record life expectancy will tend to grow simply
since the record is the maximum of a larger and larger number of largely
independent random variables. If, say country i in the group of size nt in year
t has observed life expectancy X it which are iid with cumulative distribution
function F, the record Yt max( X 1t ,L , X nt t ) has distribution
nt
P Yt y P i X it y P X it y F ( y ) nt due to independence. As nt
i 1
increases this certainly decrease for each y such that F(y)<1. The distribution
of record life expectancy is thus moving to the right from year to year. This
formal argument was not required at the exam.
6. Scholars have made claims of upper limits to female life expectancy. These claims have
been based on a variety of biological, demographic and other grounds. A claim of an upper
limit, say 64.8 years, is that no country will ever have a female life expectancy above that
limit. Oeppen and Vaupel (2000) identify 19 independent such claims or asserted ceilings
on female life expectancy, see Figure 4. The first claim was made by Dublin in 1928, and
the claim was that female life expectancy could not exceed 64.8 years. This was a failure
even when it was made; since the record life expectancy exceeded the limit already in
1921 (New Zealand had 65.9). Of the 19 claims, 14 have come out as failures by 2002.
For claim i let ti be the year the claim was made, and let Fi be the binary variable (coded
1 for failure) recording whether the claim has come out as a failure, i.e. has been beaten
by record life expectancy by 2002. Exhibit 4 shows output from two logistic regressions,
both with F as the dependent variable. The first logistic regression had only t as
regressor, while in the second case both year of claime and lapse time x 2002 t were
attempted introduced as regressors. Interpret the two sets of results. In the second case t
was dropped by Stata. Why?
The two results agree since the linear predictor in the two logistic regressions
are identical: a bt a 2002b bx . Here, b .599558 is the regression
estimate, and a is the intercept in Exhibit 1. t and x 2002 t are perfectly
4
collinear, and Stata rightly rejects to have both terms in a linear logistic
regression. Otherwise regression coefficients would not have been identifiable.
7. To what extent is life expectancy determined by economic variables like GDP per capita?
Suppose you had data on life expectancy e(0) and GDP per capita, Z , for your own
country, or say Norway. Would you think that a regression of the form
e(0) year 0 1Z year U year would yield valid results regarding the posed question?
Regard your hypothetical data as the outcome of a quasiexperiment, and discuss potential
threats to internal and external validity.
Exhibit 1
Regression with robust standard errors
Number of obs
F( 1,
159)
Prob > F
Rsquared
Root MSE
=
161
= 9928.61
= 0.0000
= 0.9821
= 1.5325

Robust
Y 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+year 
.2429773
.0024385
99.64
0.000
.2381613
.2477933
_cons  401.4199
4.754271
84.43
0.000
410.8096
392.0303
Exhibit 2
Variable 
Obs
Mean
Std. Dev.
Min
Max
+D

160
.2431875
1.096574 4.209999
5.060001
Exhibit 3
Sample:
1946 to 2000
Number of obs
Wald chi2(1)
Prob > chi2
=
=
=
55
0.84
0.3607
D

Coef.
Std. Err.
z
P>z
[95% Conf. Interval]
+_cons

.2566701
.0598694
4.29
0.000
.1393282
.3740119
+ar
L1  .1353576
.1480772
0.91
0.361
.4255836
.1548684
+
Exhibit 4
Logit estimates
Log likelihood =
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
3.865933
=
=
=
=
19
14.17
0.0002
0.6470
F

Coef.
Std. Err.
z
P>z
[95% Conf. Interval]
+t
 .5955806
.4474039
1.33
0.183
1.472476
.2813149
_cons 
1184.766
889.9979
1.33
0.183
559.5983
2929.129
note: t dropped
Logit estimates
Log likelihood =
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
3.865933
=
=
=
=
19
14.17
0.0002
0.6470
F

Coef.
Std. Err.
z
P>z
[95% Conf. Interval]
+x 
.5955806
.4474039
1.33
0.183
.2813149
1.472476
_cons  7.586783
5.772359
1.31
0.189
18.9004
3.726832

40
50
60
70
80
90
1850
1900
year
1950
2000
4
2
,D
Figure 1. Female life expectancy (in years) in best practicing country by calendar year.
Source: Oeppen and Vaupel (2000).
1850
1900
year
1950
2000
Figure 3. Female life expectancy in five countries compared with the trend in record life
expectancy. Source: Oeppen and Vaupel (2000).
Figure 4. Record female life expectancy from 1840 to the present. The linearregression trend
is depicted by a bold black line and the extrapolated trend by a dashed gray line. The
horizontal black lines show asserted ceilings on life expectancy, with a short vertical line
indicating the year of publication. The three dashed red lines denote projections of female life
expectancy in Japan published by the United Nations in 1986, 1999, and 2001: It is
encouraging that the U.N. altered its projection so radically between 1999 and 2001. Oeppen
and Vaupel (2001).