Professional Documents
Culture Documents
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/264700780
CITATIONS READS
33 9,542
2 authors, including:
SEE PROFILE
All content following this page was uploaded by Hokwon Anthony Cho on 13 August 2014.
Preface ix
1 Introduction 1
1.1 A Brief History of Regression . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Genealogy of Regression . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 The Method of Least Squares . . . . . . . . . . . . . . . . . . . . . 2
1.2 Typical Applications of Regression Analysis . . . . . . . . . . . . . . . . . 2
1.2.1 Use of Regression Analysis . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Computer Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
iii
iv CONTENTS
Appendix 413
Bibliography 421
Index 429
Chapter 1
Introduction
Regression analysis is a collection of statistical techniques that serve as a basis for draw-
ing inferences about relationships among interrelated variables. Since these techniques
are applicable in almost every eld of study, including the social, physical and biological
sciences, business and engineering, regression analysis is now perhaps the most used of
all data analysis methods. Hence, the goal of this text is to develop the basic theory of
this important statistical method and to illustrate the theory with a variety of examples
chosen from economics, demography, engineering and biology. To make the text rela-
tively self contained we have included basic material from statistics, linear algebra and
numerical analysis. In addition, in contrast to other books on this topic [27, 87], we have
attempted to provide details of the theory rather than just presenting computational and
interpretive aspects.
1
Chapter 9
Multicollinearity: Diagnosis
and Remedies
9.1 Introduction
As we pointed out in Chapter 5 one of the more di!cult problems that occurs in doing
a regression analysis is that of dealing with multicollinearity. In Example 5.17 we illus-
trated some of the consequences of this phenomenon; i.e., the di!culty in interpreting
the apparently contradictory facts of a good overall t as indicated by a large value of
U2 and a signicant observed I ratio, simultaneously with insignicant values for the
individual w statistics, while in [76] it was shown how multicollinearity could seriously
aect the accuracy of regression calculations. In addition, some of the problems with
variable selection techniques as we observed in Chapter 8 are often associated with strong
multicollinearity in the data.
In this Chapter we will examine this problem in greater detail. In particular, we will
discuss methods for detecting multicollinearity, elaborate on its statistical consequences
and examine some of the proposed remedies, particularly the method of ridge regression.
This latter technique, which has spawned volumes of research in the past 25 years [28,
58, 59], is still controversial as are other forms of biased estimation [8, 116] and is by no
means espoused by all statisticians as the work of Draper and Van Nostrand [28] and
Draper and Smith [27] shows. However, because of its prominent role in current research
and applications, no modern treatment of regression analysis would be complete without
some discussion of its use. At present, it is fair to say that many of the computational
problems associated with multicollinearity have been overcome through the development
of more sophisticated computational techniques [5, 112] such as the QR decomposition
and the advent of powerful computers. While the problem of building and interpreting
models with this problem is far from being totally resolved [116] and perhaps never will
be.
379
Appendix
Statistical Tables
413
Bibliography
[1] Aitkin, M. A. (1974). Simultaneous inference and the choice of variable subsets.
Technometrics, 16 221-227.
[2] Allen, D. M. (1971). Mean square error of prediction as a criterion for selecting
variables. Technometrics, 13 469-475.
[3] Andrews, D. F. (1974). A robust method for multiple linear regression. Technomet-
rics, 16 523-531.
[4] Anscombe, F. J. and Tucky, J. W. (1963). The examination and analysis of resid-
uals. Technometrics, 5 141-160.
[5] Atkinson, A. C. (1982). Regression diagnostics, transformations and constructed
variables. Journal of the Royal Statistical Society, Series B, 44 1-36.
[6] Atkinson, A. C. (1983). Diagnostic regression for shifted power transformations.
Technometrics, 25 23-33.
[7] Berk, K. N. (1978). Comparing subset regression procedures. Technometrics, 20
1-6.
[8] Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics: Identi-
fying In uential Data and Sources of Collinearity. John Wiley & Sons, Inc., New
York.
[9] Box, G. E. and Cox, D. R. (1964). An analysis of transformations. Journal of the
Royal Statistical Society, Series B, 26 211-243.
[10] Box, G. E., Hunter, W. G. and Hunter, J. S. (1978). Statistics for Experimenters.
John Wiley & Sons, Inc., New York.
[11] Box, G. E., Jenkins, G. and Reinsel, G. (1994). Time Series Analysis. 3rd ed.,
Prentice Hall, New Jersey.
[12] Box, G. E. and Tidwell, P. W. (1962). Transformation of the independent variables,
Technometrics, 4 531-550.
[13] Casella, G. and Berger, R. (2002). Statistical Inference. 2nd ed., Duxbury, Pacic
Grove, California.
421
Index
Adjusted R-square, 224 Bivariate normal distribution, 178
Alias, 361 BLUE estimator, 217
Analysis of covariance model, 182 Bonferroni inequality, 78
Analysis of variance, 82 Box-Cox method, 299
general linear model, 224 Box-Cox transformation, 119
lack of t analysis, 106 Box-Tidwell method, 118, 277
m-way, 332
table, 86 Canonical form, 196, 350
two-way, 332 Cauchy-Schwarz inequality, 143, 144, 302,
Analysis of variance model, 182 310
Andrew’s test, 282, 286 Centered observations, 191
Assessment function, 361 Centering matrix, 173
Atkinson’s modication, 278 Central limit theorem, 22, 25, 41, 76
Augmented coe!cient matrix, 164 Characteristic polynomial, 150
Autocorrelation Chi-square distribution, 22
Cochrane-Orcutt method, 302 Cholesky factorization, 166, 188, 307
coe!cient, 301 Cochrane-Orcutt method, 302
detecting, 301 Coe!cient matrix, 161
Durbin-Watson statistic, 302 augmented, 164
Hildreth-Lu procedure, 304 Coe!cient of determination, 83
Auxillary regressions, 390 adjusted, 362
Average estimated variance, 367 Complex conjugate, 150
Condition index, 387
Backward elimination, 369, 371, 409 Condition number, 169, 382
Backward substitution, 162 ill-conditioned, 170
Basis optimally conditioned, 170
orthonormal, 148 well-conditioned, 170
Bayes rule, 31 Conditional Poisson distribution, 46
Bayes’ theorem, 6 Condence interval, 37
Best linear unbiased estimator, 55, 217 approximate, 38
Best test, 41 exact, 37
Beta function, 24, 32 lower limit, 44
Beta weights, 195 one-sided interval, 44
Bias, 33, 47 Consistency, 33
Bias matrix, 361 Convergence
Biased estimator, 389 in probability, 33
binary 0-1 predictors, 341 Cook’s distance, 272, 285
binary 0-1 responses, 341 Correlation coe!cient, 17
Binomial coe!cient, 9 sample, 83
429
430 INDEX