You are on page 1of 12

Biometrika (2000), 87, 4, pp.

907918
2000 Biometrika Trust
Printed in Great Britain

Improved heteroscedasticity-consistent covariance matrix


estimators
B FRANCISCO CRIBARI-NETO
Departamento de Estatstica, Universidade Federal de Pernambuco, Cidade Universitaria,
Recife/PE, 50740-540, Brazil
cribari@de.ufpe.br

SILVIA L. P. FERRARI
Departamento de Estatstica, Universidade de SaA o Paulo, Caixa Postal 66281,
SaA o Paulo/SP, 05315-970, Brazil
sferrari@ime.usp.br

GAUSS M. CORDEIRO
Departamento de Estatstica, Universidade Federal da Bahia, Ondina, Salvador/BA,
40170-110, Brazil
gauss@ufba.br

S
The heteroscedasticity-consistent covariance matrix estimator proposed by White (1980)
is commonly used in practical applications and is implemented into a number of pieces
of statistical software. However, although consistent, it can display substantial bias in
small to moderately large samples, as shown by Monte Carlo simulations elsewhere. This
paper defines modified White estimators which are approximately bias-free. Numerical
results show that the modified estimators display much smaller bias than Whites estimator
in small samples. We also show that the bias correction leads to some variance inflation.
In hypothesis testing based on heteroscedasticity-consistent covariance matrix estimators,
numerical results suggest that tests based on the proposed bias-corrected estimators
typically display smaller size distortions.

Some key words: Bias correction; Covariance matrix estimation; Heteroscedasticity; Linear regression;
Whites estimator.

1. I
A common problem is to estimate reliably the covariance matrix of ordinary least
squares estimators in a linear regression model when heteroscedasticity is suspected. The
idea is to use the ordinary least squares estimator for the regression parameter vector,
which is still unbiased and consistent, albeit no longer ecient, coupled with an estimator
of its covariance matrix which is consistent regardless of whether or not the error variances
are constant. A commonly used covariance matrix estimator is that proposed by White
(1980). Whites estimator is consistent under both homoscedasticity and heteroscedasticity
of unknown form, but it can be quite biased when the sample size is small, as shown by
908 F. C-N, S. L. P. F G. M. C
MacKinnon & White (1985) and Cribari-Neto & Zarkos (1999). As noted by Greene
(1997, p. 549), resulting asymptotic t ratios are inflated. In particular, substantial down-
ward bias can occur for regression designs containing points of high leverage (Chesher &
Jewitt, 1987). The use of Whites variance estimator may thus lead one to find spurious
relationships between variables. This seems to be the case with some of the regressions in
an unpublished University of Maryland report by Bonnie Wilson. For example, using
bootstrap standard errors, she found a nonsignificant relationship between stock market
development indicators and growth in a case where Levine & Zervos (1998) found signifi-
cant relationships using the downwardly biased White standard errors.
In this paper we define a sequence of modified White estimators whose kth element has
bias of order O(n(k+2)), and our simulation results show that the improved estimators
can deliver substantial improvements over the original estimator proposed by White.
There are alternative estimators that are also used in practice although not as often as
Whites estimator. They include the jackknife estimator, but this estimator is not con-
sidered here since our focus lies on the bias correction of the White estimator. For simu-
lation results on the jackknife covariance matrix estimator, see Cribari-Neto & Zarkos
(1999) and MacKinnon & White (1985). In particular, the Monte Carlo evidence in
Cribari-Neto & Zarkos (1999) shows that the jackknife estimator can be as biased as,
and sometimes even more biased than, the White estimator.

2. I
The linear regression model is defined as y=Xb+u, where y and u are n-vectors of
responses and errors, respectively, b is a p-vector of unknown parameters ( p<n), and X
is an np matrix of fixed regressors with full column rank. The errors u have mean zero,
t
variances 0<s2 <2, for t=1, . . . , n, and are uncorrelated. Then the ordinary least
t
squares estimator of b is b@ =(XX)1Xy which has mean b and covariance matrix Y=
PVP, where P=(XX)1X. Here, V=cov(u) is a diagonal matrix with the tth diagonal
element representing the variance s2 of u , for t=1, . . . , n. Under homoscedasticity,
t t
s2 =s2, a strictly positive finite constant, for all t, and hence cov( b@ )=s2(XX)1. Whites
t
(1980) covariance matrix estimator, which is consistent under both homoscedasticity and
heteroscedasticity, can be written as
Y C P,
C =PV (1)
where V C =diag{u@ 2 , . . . , u@ 2 } and u@ =(u@ , . . . , u@ )={(IH )y} with H=X(XX)1X
1 n 1 n
being symmetric and idempotent and I denoting the nn identity matrix.
As noted before, the White estimator in (1) can be substantially biased. The derivation
of its bias is straightforward. At the outset, note that V C =(u@ u@ ) , where (M) denotes the
d d
diagonal matrix formed from the diagonal elements of the matrix M. Also,
E(u@ u@ )=cov(u@ )+E(u@ )E(u@ )
=(IH )V(IH )+(IH )XbbX(IH ).
Since (IH )X=0, it follows that E(u@ u@ )=(IH )V(IH ). Hence,
E(VC )={(IH )V(IH )}
d
and the mean of Whites estimator is
C )=P{(IH )V(IH )} P.
E(Y
d
Estimating covariance matrices 909
C and Y
The biases of V C as estimators of V and Y are
B (V)={(IH )V(IH )} V
VC d
={HV(H2I )} , (2)
d
B (V)=P{HV(H2I )} P, (3)
YC d
respectively. It will be shown later that B (V)=O(n1) and B (V)=O(n2). Our aim is
VC YC
to define a sequence of corrected estimators for Y, {Y C (k), k=1, 2, . . .} say, such that
B (V)=O(n(k+2)). We start by defining a sequence of modified estimators for V,
YC (k)
{VC (k), k=1, 2, . . .} say, such that B (V)=O(n(k+1)).
VC (k)
Let M(1)(A)={HA(H2I )} , where H is as defined above, A is a diagonal matrix of
d
order n, and I is the nn identity matrix. Let
M(2)(A)=M(1){M(1)(A)}, M(3)(A)=M(1){M(2)(A)}, M(4)(A)=M(1){M(3)(A)},
and so on, and M(0)(A)=A. Let A and B be two nn diagonal matrices, and let
k=0, 1, 2, . . . . Then it is easy to show that the following properties hold.
Property 1. We have M(k)(A)+M(k)(B)=M(k)(A+B).
Property 2. We have M(k)[{HA(H2I )} ]=M(k+1)(A).
d
Property 3. We have E{M(k)(A)}=M(k){E(A)}.
We can now write (2) and (3) as
B (V)=M(1)(V), (4)
VC
B (V)=PM(1)(V)P,
YC
C (1)=V
respectively. We define a bias-corrected estimator for V as V C B (VC ). From (4), we
VC
have that
C (1)=V
V C M(1)(V
C ).
Note that M(1)(V C ) is the estimated bias of V
C . Unlike V
C , the corrected estimator V C (1) is
nearly unbiased, any bias in this estimator being only because it uses M(1)(V C ) instead of
the unknown M(1) (V).
The next step is to correct any bias induced by the estimation of M(1)(V). The bias of
VC (1) can be written as
B C M(1)(V
(V)=E{V C )}V=B (V)E{M(1)(V C )}
VC (1) VC
C )M(1)(V)}M(1)(V)
=M(1)(V)E{M(1)(V
C )V},
=M(1){E(V
by Properties 1 and 3. Using Property 2 and (4), we obtain
B
(V)=M(2)(V).
VC (1)
We then define a second bias-corrected estimator for V as
C (2)=V
V C (1)B C )=V
(V C M(1)(V
C )+M(2)(VC ).
VC (1)
This procedure may be repeated iteratively, and after k iterations we arrive at
k
C (k)= (1)jM(j)(V
V C ) (k=1, 2, . . .). (5)
j=0
910 F. C-N, S. L. P. F G. M. C
C (k) can be derived using Properties 13 and equation (4) as
The bias of V

q r
k
B (V)=E C ) V
(1)jM(j)(V
VC (k)
j=0
k
C V)}+M(j)(V)]V
= (1)j[E{M(j)(V
j=0
k
= (1)j{M(j+1)(V)+M(j)(V)}V
j=0
=(1)kM(k+1)(V). (6)
C (k), k=1, 2, . . .} for Y, where
We now define a sequence of estimators {Y
Y C (k)P
C (k)=PV (k=1, 2, . . .), (7)
whose bias follows from (6) and (7) as
B (V)=(1)kPM(k+1)(V)P. (8)
YC (k)
To compute the orders of the biases of the two sequences of estimators, we shall use
the following facts. If A is a diagonal matrix such that A=O(nr) for some r0, then
(i) PAP=O(n(r+1)) and (ii) M(A)={HA(H2I )} =O(n(r+1)). These follow from the
d
matrices P=(XX)1X and H=X(XX)1X being O(n1). Since V=O(1), we have that
M(k+1)(V)=O(n(k+1)), and therefore B (V)=O(n(k+1)). In order to obtain a similar
VC (k)
C (k), note that Y
result for Y C (k)=PV C (k)P, and thus
B
(V)=PB (V)P=(1)kPM(k+1)(V)P.
YC (k) VC (k)
Since M(k+1)(V)=O(n(k+1)), it is established that
B (V)=O(n(k+2)).
YC (k)
That is, the bias of the kth recursively-improved estimator is O(n(k+2)).
For the above result to hold, as n 2 the design matrix X must be such that the
matrices P and H are of order O(n1). Another condition is that the matrix V must be
of order O(1). This means, for example, that the variance function must be bounded at
all covariate values.
Finally, we shall derive the variances of cY C (k)c, where c is a p-vector of constants, for
k=0, 1, 2, . . . . The quantity cYC (k)c represents the kth iterated corrected estimator of the
variance of cb@ , with k=0 corresponding to Whites estimator. Note that cY C c can be
C
written as u@ Vu@ , where V =diag{v2 , . . . , v2 }=(vv) with v=Pc. Hence, cYc is a quadratic
1 n d
form in the ordinary least squares residuals. Similarly,
n
cYC (1)c=cY C ccP{HV C (H2I )} Pc=u@ Vu@ a@ v2 ,
d s s
s=1
where a@ is the sth diagonal element of {HV C (H2I )} . Since a@ =W n h2 u@ 2 2h u@ 2 ,
s d s t=1 st t ss s
where h is the (s, t) element of the hat matrix H, it can be shown that
st

A B
n n
cYC (1)c=u@ Vu@ u@ 2 h2 v2 2h v2
t st s tt t
t=1 s=1
=u@ [V{HV (H2I )} ]u@ =u@ {V M(1)(V )}u@ .
d
Estimating covariance matrices 911
More generally, it can be shown that
C (k)c=u@ Q u@ ,
cY
k
where Q =W k (1)jM(j)(V ). We also have that
k j=0
cY C (k)c=zG z,
k
where G =VD(IH )Q (IH )VD is a symmetric matrix of dimension n and z=VDy is
k k
an n1 random vector with mean h=VDXb and identity covariance matrix such that
hG =0. We note that
k
zG z=(zh)G (zh)+2hG (zh)+hG h,
k k k k
and, since hG =0, it follows that
k
cYC (k)c=wG w,
k
where w=(zh)=VDu. Thus,
var(cY C (k)c)=E{(wG w)2}{E(wG w)}2.
k k
We also have that E(wG w)=tr(G ). In order to evaluate E{(wG w)2}, we use
k k k
u uuu
(wG w)2= g(k) g(k) q r s t ,
k qr st s s s s
q,r,s,t=1 q r s t
where g(k) is the (s, t) element of G . Hence,
st k
n g(k) g(k) E(u u u u )
var(cY C (k)c)= qr st q r s t {tr(G )}2, (9)
ssss k
q,r,s,t=1 q r s t
assuming that E(u u u u ) exists for all q, r, s, t. Equation (9) gives a general expression
q r s t
for the variance of the estimated variances, Whites and its bias-corrected variants, of
linear combinations of ordinary least squares estimates in terms of fourth-order joint
moments of the errors u and the elements of G .
t k
In the special case where the errors u , . . . , u are independent, (9) reduces to
1 n
n m
var(cY C (k)c)= (g(k) )2 4s + g(k) g(k) +2 (g(k) )2{tr(G )}2
ss s4 ss tt st k
s=1 s sNt sNt
=g Lg +2 tr(G2 ), (10)
k k k
where m =E(u4 ), g is a column vector containing the diagonal elements of G and L=
4t t k k
diag{c } with c =(m 3s4 )/s4 , for t=1, . . . , n, being the coecient of excess kurtosis
2t 2t 4t t t
of the tth error term. Further, if we assume that the errors are normally distributed,
L=0 and var(cY C (k)c)=2 tr(G2 ). If the errors are independent and follow a t distribution
k
with n>4 degrees of freedom, then (10) yields
6
C (k)c)=
var(cY g g +2 tr(G2 ).
n4 k k k
Finally, note that the bias-corrected estimators given here only involve simple operations
on matrices and can therefore be easily computed.

3. N
The numerical results presented below were obtained using a simple regression model
of the form y =b +b x +u , for t=1, . . . , n, where u , . . . , u are uncorrelated, each u
t 1 2 t t 1 n t
912 F. C-N, S. L. P. F G. M. C
having mean 0 and variance h(a x +a x2 ), h being a variance function. Note that we do
1 t 2 t
not need to specify the error distribution. We have used a =a =0, 10, 20, 25, where
1 2
zero corresponds to homoscedasticity, and h(.)=exp(.), corresponding to multiplicative
heteroscedasticity. Three sample sizes were considered, namely n=20, 50 and 100. For
each sample size, the covariate values were chosen as random draws from a uniform
distribution Un(0, 1). All results were obtained using the matrix programming language
Ox (Doornik, 1999).
The x values were obtained from a Un(0, 1) distribution and their first four sample
moments, that is mean, variance, skewness and kurtosis, are (053, 010, 035, 156) for
n=20, (045, 009, 003, 158) for n=50 and (046, 009, 006, 177) for n=100. The corre-
sponding population measures are (050, 008, 000, 180). Therefore, the sampled covariate
values are representative of a uniform distribution.
Table 1(a) displays the total relative bias of the ordinary least squares, White (1980)
and the first four bias-corrected White estimators Y C (1), Y
C (2), YC (3) and YC (4). The total relative
bias is defined as the sum of the absolute values of the relative biases for the estimated
variances of b@ and b@ . That is, for each estimator we report
1 2
|E{va@ r(b@ )}var(b@ )| |E{va@ r( b@ )}var( b@ )|
1 1 + 2 2 .
var(b@ ) var( b@ )
1 2
where va@ r denotes the relevant variance estimator. The exact biases of the White estimator
and its bias-corrected variants were obtained using the result in equation (8). The exact
bias of the ordinary least squares variance estimator can be shown to be
tr{V(IH )}
(XX)1PVP,
np
where here p=2. Table 1(a) also reports a measure of the degree of heteroscedasticity for
each case, namely l=max{s2 }/min{s2 }, for t=1, . . . , n. Of course, l=1 under homo-
t t
scedasticity, and the greater l, the greater degree of heteroscedasticity.
The figures in Table 1(a) lead to several interesting conclusions. First, as expected, the
ordinary least squares estimator works well under homoscedasticity but is considerably
biased when the error variances are not equal. For example, its total relative bias for
n=50 and a =a =10 exceeds 100%. Secondly, even though the numbers in Table 1(a)
1 2
are given in absolute values, the individual results not reported here reveal that Whites
estimator always underestimates the true variances. Thirdly, Whites estimator displays
relatively large biases in some cases. For instance, when n=20, its total relative bias is
always around 25%, even under homoscedasticity. Fourthly, the small-sample biases of
our corrected estimators are always quite small. For example, whereas the total relative
bias of the White estimator for n=20 reaches and exceeds 25%, the corresponding meas-
ures for our first bias-corrected estimator are around 4%, the other corrected estimators
displaying nearly zero bias. When the sample size equals 100, all four bias-corrected
estimators are nearly unbiased against a bias of approximately 5% to 8% for Whites
estimator.
A second numerical evaluation introduces points of high leverage in x by selecting
the covariate values as random draws from a t distribuion, which has fat tails. Here,
3
a =a =000, 005, 010, 015, with zero again corresponding to homoscedasticity. The
1 2
results are presented in Table 1( b).
Table 1( b) reveals that the bias of Whites covariance matrix estimator increases con-
siderably when there are points of high leverage in the regression matrix. For example,
Estimating covariance matrices 913
Table 1. Bias evaluation using model y =b +b x +u ,
t 1 2 t t
u ~(0, exp{a x +a x2 }), for t=1, . . . , n. T he covariate
t 1 t 2 t
x is a draw from a Un(0, 1) or a t distribution;
t 3
l=max(s2 )/min(s2 ) is a measure of the degree of hetero-
t t
scedasticity. T he quantities tabulated are the total relative
biases of the diVerent estimators
(a) Un(0, 1) distribution
n a ,a l
1 2 1 2 3 4
20 000 100 0000 0264 0041 0008 0002 0000
100 634 1085 0251 0038 0007 0001 0000
200 4024 2375 0263 0042 0007 0001 0000
250 10134 2747 0277 0046 0008 0002 0001
50 000 100 0000 0099 0006 0000 0000 0000
100 642 0818 0109 0008 0001 0000 0000
200 4123 1716 0130 0011 0001 0000 0000
250 10447 1914 0142 0013 0001 0000 0000
100 000 100 0000 0053 0002 0000 0000 0000
100 735 0739 0058 0002 0000 0000 0000
200 5409 1344 0070 0003 0000 0000 0000
250 14668 1421 0076 0004 0000 0000 0000

(b) t distribution
3
n a ,a l
1 2 1 2 3 4
20 000 100 0000 0409 0207 0151 0120 0098
005 502 0649 0670 0447 0353 0287 0234
010 2521 1034 0791 0556 0458 0374 0305
015 12660 1391 0825 0608 0506 0414 0338
50 000 100 0000 0176 0038 0012 0004 0002
005 502 0583 0294 0085 0030 0011 0004
010 2521 0887 0405 0131 0048 0019 0008
015 12660 1040 0497 0172 0065 0026 0011
100 000 100 0000 0098 0015 0004 0001 0000
005 502 0523 0183 0040 0011 0003 0001
010 2521 0804 0260 0064 0018 0005 0002
015 12660 0914 0324 0084 0024 0007 0002
, ordinary least squares estimator; , Whites (1980) estimator; ,
1
first-order bias-corrected estimator; , second-order bias-corrected
2
estimator; , third-order bias-corrected estimator; , fourth-order
3 4
bias-corrected estimator.

when n=50 and a =a =015, the total relative bias of this estimator reaches 50%. Our
1 2
corrected estimators also display non-negligible biases when n is small. However, their
biases are always much smaller than the bias of the White estimator, and they converge
to zero faster as n increases. For instance, when n=100 and a =a =015, the aggregate
1 2
bias of Whites estimator is over 30% whereas the corresponding figures for our third-
and fourth-order corrected estimators are nearly zero. The first corrected estimator
displays total relative bias of 84% in this case whereas the second corrected estimator
has a bias of only 24%.
The presence of high leverage points in the second numerical design can be identified
914 F. C-N, S. L. P. F G. M. C
by looking at h , for t=1, . . . , n, the diagonal elements of H. It can be shown that
tt
0h 1 for all t and W n h =p. The h s therefore have an average value of p/n. A
tt t=1 tt tt
general rule-of-thumb is that the values of h in excess of two or three times the average
tt
are regarded as influential and worthy of further investigation (Judge et al., 1988, p. 893).
As noted by Chesher & Jewitt (1987, p. 1219), the possibility of severe downward bias in
the White estimator arises when there are large h , because the associated least squares
tt
residuals have small magnitude on average and the White estimator takes small residuals
as evidence of small error variances. Table 2 lists the minimum and maximum values of
h for the two datasets, along with 2p/n and 3p/n. The minimum and maximum values of
tt
h are denoted by h and h , respectively. Table 2 reveals the propensity of the t
tt min max 3
distribution design to involve points of high leverage in the design matrix. Note also that
the variance function used for the computations presented in Table 1 associates large
variances with large |x| values, thus implying that points of high leverage where |x| takes
on isolated large values will yield large variances. When x was obtained from a uniform
distribution, there was no point of high leverage and large variances were obtained using
relatively large values for a and a , these large variances therefore not being associated
1 2
with influential observations. In contrast, when x was obtained from a t distribution,
3
there were points of high leverage and the values of a and a are small relative to the
1 2
first design, thus implying that large variances are associated with large |x| values. As
noted by a referee, since points of high leverage generate residuals of small magnitude,
there is the possibility of very severe bias when the error variance is large at high-leverage
points, as in the results for the White estimator in Table 1( b).

Table 2. Influential observations diagnostic


Uniform
covariate t covariate
3
n h h h h 2p/n 3p/n
min max min max
20 0055 0185 0050 0545 0200 0300
50 0020 0078 0020 0250 0080 0120
100 0010 0044 0010 0179 0040 0060

A quick comparison between Tables 1(a) and ( b) suggests that the presence of high-
leverage points has more influence on the finite-sample behaviour of the White estimator
and its variants than has the degree of heteroscedasticity itself. For example, when n=50
and the design matrix has no point of high leverage, we observe that under very strong
heteroscedasticity, l=10447, the total relative bias of the White estimator is 142%, which
is smaller than its bias when the design matrix has points of high leverage but there is no
heteroscedasticity at all, namely 176%. This pattern is apparent for all sample sizes and
all estimators.
Variance estimators of dierent linear combinations of the ordinary least squares esti-
mator are aected dierently by heteroscedasticity. In order to investigate this, we deter-
mine the linear combination of the regression parameter estimators that yields the maximal
estimated variance bias, that is the p-vector c such that cc=1 and E{va@ r(cb@ )}var(cb@ )
is maximised. Since the bias matrices given in (8) are symmetric, the maximum absolute
value of the bias of the estimated variances of linear combinations of the b@ s is the maxi-
mum of the absolute values of the eigenvalues of the corresponding bias matrices. This is
because max cLc/cc, where L is a symmetric pp matrix, is given by the largest eigen-
c
value of L; see Amemiya (1985, p. 460). These figures are given in Table 3 for the case
Estimating covariance matrices 915
where x is obtained from a uniform distribution; note that these biases are not relative.
It is clear that the maximum bias of the White estimator is always considerably larger
than those of the corrected estimators. For example, when n=20 and a =a =25, the
1 2
maximum bias of Whites estimator is 4676 times greater than that of the four-iteration
corrected estimator.

Table 3. Bias evaluation: absolute values of the


maximal bias of va@ r(cb@ ) for diVerent variance
estimators
n a ,a
1 2 1 2 3 4
20 000 0090 0014 0003 0001 0000
100 0214 0035 0007 0001 0000
200 0982 0180 0038 0008 0002
250 2338 0450 0097 0022 0005
50 000 0014 0001 0000 0000 0000
100 0044 0003 0000 0000 0000
200 0216 0020 0002 0000 0000
250 0509 0050 0006 0001 0000
100 000 0004 0000 0000 0000 0000
100 0013 0001 0000 0000 0000
200 0073 0004 0000 0000 0000
250 0179 0009 0001 0000 0000
, Whites (1980) estimator; , first-order bias-
1
corrected estimator; , second-order bias-corrected
2
estimator; , third-order bias-corrected estimator;
3
, fourth-order bias-corrected estimator.
4

It is well known that bias reduction often leads to variance inflation. We have used the
result in equation (9) to evaluate numerically the degree of such variance inflation. The
results for the uniform covariate case and c=(0, 1) are given in Table 4; entries are
standard deviations. There is indeed a slight increase in variance induced by the iterated
bias correction mechanism. For example, when n=50 and a =a =20, the standard
1 2
deviations for the White estimator and its four bias-corrected variants are 1148, 1258,
1270, 1272 and 1272, respectively.
An important issue we now address is whether or not the corrected estimators deliver
improved inference in finite samples when used in conjunction with quasi-t tests based on
asymptotic normal critical values; we wish to evaluate whether the bias reduction more
than osets the variance inflation of the corrected estimators as far as inference on the
regression parameters is concerned. Note that by assuming normality one can use a
computer intensive algorithm (Imhof, 1961), as did Chesher & Austin (1991), to obtain
a precise approximation to the null distribution of the test statistics. Here we follow
MacKinnon & White (1985) and investigate the null rejection rates of the quasi-t statistics
using simulation methods. The null hypothesis under test is H : b =1, and the study was
0 2
conducted under the null hypothesis and using b =1 and normal errors. All results are
1
based on 100 000 replications. Figure 1 plots the size distortions, that is estimated null
rejection percentages minus the nominal 5%, against dierent values of a , a for n=50
1 2
in the case where x was obtained from a uniform distribution. Figure 1 shows that the
tests that employ estimated variances from our bias-corrected estimators display smaller
size distortions than the test based on the over-liberal White estimator, while not fully
916 F. C-N, S. L. P. F G. M. C
Table 4. Variance evaluation: standard deviations
of the estimated variance of the ordinary
least squares estimate of the slope in a simple
regression model
n a ,a
1 2 1 2 3 4
20 000 0188 0220 0227 0228 0229
100 0607 0713 0735 0740 0742
200 3393 4033 4171 4204 4211
250 8306 9904 10255 10337 10356
50 000 0054 0058 0058 0058 0058
100 0209 0228 0230 0230 0230
200 1148 1285 1270 1272 1272
250 2754 3026 3057 3060 3061
100 000 0022 0022 0022 0022 0022
100 0092 0096 0096 0096 0096
200 0546 0574 0576 0576 0576
250 1367 1439 1444 1444 1444
, Whites (1980) estimator; , first-order bias-
1
corrected estimator; , second-order bias-corrected
2
estimator; , third-order bias-corrected estimator;
3
, fourth-order bias-corrected estimator.
4

(a) n = 20 (b) n = 50
6 6
W
5 5 BCW1
Size distortion (%)

BCW2
4 4 BCW3
BCW4
3 3

2 2

1 1

0 0
00 05 10 15 20 25 00 05 10 15 20 25
1 = 2 1 = 2

Fig. 1. Size distortions of quasi-t tests with nominal significance level 5%; , Whites (1980) estimator; ,
1
first-order bias-corrected estimator; , second-order bias-corrected estimator; , third-order bias-
2 3
corrected estimator; , fourth-order bias-corrected estimator.
4

correcting the liberal tendency. Note that gains from high-order iterations of the bias-
correction process are negligible; two iterations in the bias-correction scheme are enough.

4. A
The dependent variable y is per capita spending on public schools and the independent
variables, x and x2, are per capita income by state in 1979 in the United States and its
square; income is scaled by 104. The state of Wisconsin was dropped from the sample
since it had missing data, and Washington D.C. was included. The data are given
Estimating covariance matrices 917
in Greene (1997, Table 12.1, p. 541) and their original source is the U.S. Department
of Commerce. For these data, b@ =832914, b@ =1834203 and b@ =1587042. The
1 2 3
BreuschPagan score test for the null hypothesis of homoscedasticity rejects the hypothesis
that the error variances are constant at the 1% nominal level.
Table 5 presents the standard errors for these estimates obtained from dierent covari-
ance matrix estimators. Whites standard errors are smaller than those obtained from our
bias-corrected estimators, illustrating the fact that Whites estimator tends to be too
optimistic. Finally, consider the test of the null hypothesis H : b =0 against a two-sided
0 3
alternative. The quasi-t statistic based on Whites standard error equals 1912, which would
lead us to reject the null hypothesis at the 10% level. The quasi-t statistic becomes nonsig-
nificant when we employ standard errors from the bias-corrected covariance matrix esti-
mators. It turns out that a scatterplot shows a satisfactorily linear scatter except for a
single high-leverge point, corresponding to Alaska, with h =0651, whereas 3p/n=0180.
tt
If we remove Alaska from the sample and rerun the regression and the test of H : b =0,
0 3
we no longer reject the null hypothesis at the 10% nominal level regardless of the variance
estimate used. That is, we cannot reject the linear specification.

Table 5. Standard errors for least squares


estimates in the illustrative example. T he
entries represent standard errors for b@ , b@
1 2
and b@ in a linear regression of the form
3
y =b +b x +b x2 +u , for t=1, . . . , 50
t 1 2 t 3 t t
Estimator va@ r( b@ ) va@ r( b@ ) va@ r(b@ )
1 2 3
327292 828985 519077
460892 1243043 829993
551939 1495051 1001775
1
603904 1638066 1098539
2
641572 1741220 1167942
3
672033 1824419 1223769
4
, ordinary least squares estimator; , Whites (1980)
estimator; , first-order bias-corrected estimator;
1
, second-order bias-corrected estimator; ,
2 3
third-order bias-corrected estimator; , fourth-
4
order bias-corrected estimator.

A
We gratefully acknowledge financial support from the following Brazilian agencies:
Conselho Nacional de Desenvolvimento Cientfico e Tecnologico, Financiadora de
Estudos e Projetos, and FundacaA o de Amparo a Pesquisa do Estado de SaA o Paulo. We
also thank Klaus Vasconcellos, Spyros Zarkos and two anonymous referees for comments
and suggestions that led to a much improved paper.

R
A, T. (1985). Advanced Econometrics. Cambridge, MA: Harvard University Press.
C, A. & A, G. (1991). The finite-sample distributions of heteroskedasticity robust Wald statistics.
J. Economet. 47, 15373.
918 F. C-N, S. L. P. F G. M. C
C, A. & J, I. (1987). The bias of a heteroskedasticity consistent covariance matrix estimator.
Econometrica 55, 121722.
C-N, F. & Z, S. G. (1999). Bootstrap methods for heteroskedastic regression models: evidence
on estimation and testing. Economet. Rev. 18, 21128.
D, J. A. (1999). Object-Oriented Matrix Programming Using Ox, 3rd ed. London and Oxford:
Timberlake Consultants: http://www.nu.ox.ac.uk/Users/Doornik/.
G, W. H. (1997). Econometric Analysis, 3rd ed. Upper Saddle River, NJ: Prentice Hall.
I, J. P. (1961). Computing the distribution of a quadratic form in normal variates. Biometrika 48, 41926.
J, G. C., H, R. C., G, W. E., L, H. & L, T.-C. (1988). Introduction to the T heory
and Practice of Econometrics, 2nd ed. New York: Wiley.
L, R. & Z, S. (1998). Stock markets, banks, and economic growth. Am. Econ. Rev. 88, 53758.
MK, J. G. & W, H. (1985). Some heteroskedasticity-consistent covariate matrix estimators with
improved finite-sample properties. J. Economet. 29, 30525.
W, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for hetero-
skedasticity. Econometrica 48, 81738.

[Received June 1999. Revised May 2000]

You might also like