You are on page 1of 2

A Note on Heteroskedasticity Tests in STATA

Consider the following model estimated in STATA using OLS:

. reg price lotsize sqrft bdrms

Source | SS df MS Number of obs = 88


-------------+------------------------------ F( 3, 84) = 57.46
Model | 617130.701 3 205710.234 Prob > F = 0.0000
Residual | 300723.805 84 3580.0453 R-squared = 0.6724
-------------+------------------------------ Adj R-squared = 0.6607
Total | 917854.506 87 10550.0518 Root MSE = 59.833

------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lotsize | .0020677 .0006421 3.22 0.002 .0007908 .0033446
sqrft | .1227782 .0132374 9.28 0.000 .0964541 .1491022
bdrms | 13.85252 9.010145 1.54 0.128 -4.06514 31.77018
_cons | -21.77031 29.47504 -0.74 0.462 -80.38466 36.84404
------------------------------------------------------------------------------

We may suspect that the error term is heteroskedastic, that is, that var(ui) is some function of the xj.
We can test for this in STATA using several different commands. hettest and bpagan perform the
Breusch-Pagan (aka Cook-Weisberg) test. This is a large sample test which assumes normality of
the error terms.1 The test statistic in both cases is:

( SSE1 ) / 2
BP = ~ χ 12
(SSR / n ) 2

Where SSE1 is the explained sum of squares from a regression of the squared OLS residuals on the
xj, and SSR is the residual sum of squares from a regression of y on the xj. This is equivalent to
assuming that the log of the variance of the error term is a linear function of the xj, or that:

var(ui) = σ2exp(β0+ β1x1+… βkxk).

The following example shows the computation of this statistic (with hettest, bpagan, and then
manually):

. hettest lotsize sqrft bdrms, rhs

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity


Ho: Constant variance
Variables: lotsize sqrft bdrms

chi2(3) = 30.02
Prob > chi2 = 0.0000

. bpagan lotsize sqrft bdrms

Breusch-Pagan LM statistic: 30.02273 Chi-sq( 3) P-value = 1.4e-06

1
For the hettest and bpagan commands to be equivalent, use the following syntax for hettest: hettest [varlist], rhs.
This tells STATA that the variance is thought to be a function of the explanatory variables. Without [varlist] and the
“rhs” option, STATA will assume the variance is thought to be a function of the fitted values.
. predict uhat, resid

. gen uhat2=uhat*uhat

. reg uhat2 lotsize sqrft bdrms

Source | SS df MS Number of obs = 88


-------------+------------------------------ F( 3, 84) = 5.34
Model | 701213780 3 233737927 Prob > F = 0.0020
Residual | 3.6775e+09 84 43780003.5 R-squared = 0.1601
-------------+------------------------------ Adj R-squared = 0.1301
Total | 4.3787e+09 87 50330276.7 Root MSE = 6616.6

------------------------------------------------------------------------------
uhat2 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lotsize | .2015209 .0710091 2.84 0.006 .0603116 .3427302
sqrft | 1.691037 1.46385 1.16 0.251 -1.219989 4.602063
bdrms | 1041.76 996.381 1.05 0.299 -939.6526 3023.173
_cons | -5522.795 3259.478 -1.69 0.094 -12004.62 959.0347
------------------------------------------------------------------------------

So in this case BP = [701213780/2] / [300723.805/88]^2 = 30.02 (the same as hettest and bpagan).

The Koenker (1983) variation on this test—called the BP test in Wooldridge (2003)—assumes that
the variance of the error term is linear in the xj. This BP statistic must be calculated manually by
first obtaining the squared OLS residuals (uhat2 above), and then regressing the squared residuals
on all of the explanatory variables (also shown above). The LM statistic is then nRuˆ22 where the r-
squared comes from the regression above. Here, the BP (Koenker variation) statistic is:

BP = 88 * 0.1601 = 14.0888

You can compute the p-value for this statistic in STATA as follows:

. display 1-chi2(3,14.088)
.00278778

The White test (whitetst) regresses the squared OLS residuals on all explanatory variables, all
squared explanatory variables, and all cross-products of the explanatory variables. The LM statistic
is then nRuˆ22 where the r-squared comes from this regression. This is shown below.

. whitetst

White's general test statistic: 33.73166 Chi-sq( 9) P-value = 1.0e-04

It can be shown that the White test is a special case of the Koenker version of the Breusch-Pagan
test. For more information, see Waldman (1983).2

2
Waldman, D.M. (1983) “A Note on the Algebraic Equivalence of White’s Test and a Variation of the
Godfrey/Breusch-Pagan Test for Heteroskedasticity.” Economics Letters v. 13, pp. 197—200.