Professional Documents
Culture Documents
Methods
Volume 11 | Issue 2 Article 15
11-1-2012
Recommended Citation
Khalaf, Ghadban (2012) "A Proposed Ridge Parameter to Improve the Least Square Estimator," Journal of Modern Applied Statistical
Methods: Vol. 11 : Iss. 2 , Article 15.
DOI: 10.22237/jmasm/1351743240
Available at: http://digitalcommons.wayne.edu/jmasm/vol11/iss2/15
This Regular Article is brought to you for free and open access by the Open Access Journals at DigitalCommons@WayneState. It has been accepted for
inclusion in Journal of Modern Applied Statistical Methods by an authorized editor of DigitalCommons@WayneState.
A Proposed Ridge Parameter to Improve the Least Square Estimator
Cover Page Footnote
This research was financed by King Khalid University, Abha, Saudi Arabia, Project No. 182.
This regular article is available in Journal of Modern Applied Statistical Methods: http://digitalcommons.wayne.edu/jmasm/vol11/
iss2/15
Journal of Modern Applied Statistical Methods Copyright © 2012 JMASM, Inc.
November 2012, Vol. 11, No. 2, 443-449 1538 – 9472/12/$95.00
Ghadban Khalaf
King Khalid University,
Saudi Arabia
Ridge regression, a form of biased linear estimation, is a more appropriate technique than ordinary least
squares (OLS) estimation in the case of highly intercorrelated explanatory variables in the linear
regression model Y = Xβ + u . Two proposed ridge regression parameters from the mean square error
(MSE) perspective are evaluated. A simulation study was conducted to demonstrate the performance of
the proposed estimators compared to the OLS, HK and HKB estimators. Results show that the suggested
estimators outperform the OLS and the other estimators regarding the ridge parameters in all situations
examined.
Introduction
where βˆ is an unbiased estimator of β .
Consider the standard model for multiple linear
regression Multiple linear regression is very sensitive to
predictors that are in a configuration of near
collinearity. When this is the case, the model
Y = β 01 + X β + u , (1)
parameters become unstable (large variances)
and cannot be interpreted. From a mathematical
where Y is a (n × 1) column vector of standpoint, near-collinearity makes the X ′X
observations on the dependent variable, β 0 is a matrix ill-conditioned (with X the data matrix),
scalar intercept, 1 is a (n × 1) vector with all that is, the value of its determinant is nearly
components equal to unity, X is a (n × p) fixed zero, thus, attempts to calculate the inverse of
matrix of observations on the explanatory the matrix result in numerical snags with
uncertain final values.
variables and is of full rank p, β is a (p × 1) Exact collinearity occurs when at least
unknown column vector of regression one of the predictors is a linear combination of
coefficients and u is a (n × 1) vector of random other predictors. Therefore, X is not a full rank
matrix, the determinant of X is exactly zero, and
errors, E (u ) = 0 , E (uu ′) = σ 2 I n , where I n
denotes the (n × n) identity matrix and the prime inverting X ′X is not simply difficult, it does not
denotes the transpose of a matrix. exist.
When multicollinearity occurs, the least
The OLS estimator, βˆ , of the squares estimates remain unbiased and efficient.
parameters is given by The problem is that the estimated standard error
of the coefficient β i (for example, Sbi ) tends to
ˆ
β = ( X ′X )−1 X ′ Y (2) be inflated. This standard error has a tendency to
be larger than it would be in the absence of
multicollinearity because the estimates are very
sensitive to any changes in the sample
Ghadban Khalaf is an Associate Professor of
observations or in the model specification. In
Statistics in the Department of Mathematics. He
other words, including or excluding a particular
is a member of the Faculty of Science. Email
variable or certain observations may greatly
him at: albadran50@yahoo.com.
443
A PROPOSED RIDGE PARAMETER TO IMPROVE THE LEAST SQUARES ESTIMATOR
change the estimated partial coefficient. If Sbi is to col linearity. Leclerc and Pireaux (1995)
larger than it should be, then the t-value for suggested that a VIF value exceeding 300 may
indicate the presence of multicollinearity.
testing the significance of β i is smaller than it
Conversely, examining a pairwise correlation
should be. Thus, it becomes more likely to matrix of explanatory variables might be
conclude that a variable X i is not important in a insufficient to identify collinearity problems
relationship when, in fact, it is important. because near linear dependencies may exist
Several criteria have been put forth to among more complex combinations of
detect multicollinearity problems. Draper and regressors, that is, pairwise independence does
Smith (1998) suggested the following: not imply independence. Because VIF is a
function of the multiple correlation coefficient
(1) Check if any regression coefficients have the among the explanatory variables, it is a much
wrong sign, based on prior knowledge. more informative tool for detecting
multicollinearity than the simple pairwise
(2) Check if predictors anticipated to be correlations.
important based on prior knowledge have Many procedures have been suggested
regression coefficients with small t- in an attempt to overcome the effects of
statistics. multicollinearity in regression analysis. Horel
and Kennard (1970) proposed a class of biased
(3) Check if deletion of a row or a column of estimator called ridge regression estimators as
the X matrix produces a large change in the an alternative to the OLS estimator in the
fitted model. presence of collinearity. Freund and Wilson
(1998) summarize these into three classes:
(4) Check the correlations between all pairs of variable selection, variable redefinition and
predictor variables to determine if any are biased estimation, such as ridge regression.
unexpectedly high. Ridge regression is a variant of ordinary
multiple linear regression whose goal is to
circumvent the problem of predictors
(5) Examine the variance inflation factor (VIF).
collinearity. Ridge regression gives up the OLS
The VIF of X i is given by:
estimator as a method for estimating the
parameters of the model and focuses instead on
1 the X ′X matrix; this matrix will be artificially
VIFi = ,
1 − Ri2 (3)
modified in order to make its determinant
appreciably different from zero. The idea is to
add a small positive quantity, for example k , to
where Ri2 is the squared multiple each of the diagonal elements of the matrix
correlation coefficient resulting from the X ′X to reduce linear dependencies observed
regression of X i against all other among its columns. A solution vector is thus
explanatory variables. obtained by the expression
ˆ
β ∗ = ( X ′X + k I p )−1 X ′ Y ,
If X i has a strong linear relation with (4)
2
other explanatory variables, then R will be
i
where the ridge parameter k > 0 represents the
close to one and VIF values will tend to be very
high. However, in the absence of any linear degree of shrinkage. By adding the term kI p ,
2
relation among explanatory variables, Ri will I p is an identity matrix of the same order as X′X,
be zero and the VIF will equal one. It is known the ridge-regression model reduces
that a VIF value greater than one indicates multicollinearity and prevents the matrix X′X
deviation from orthogonality and has tendencies
444
GHADBAN KHALAF
from being singular even if X itself is not of full optimal values for ki will be
rank.
Note that if k = 0, the ridge-regression
coefficients, defined by (4), are equal to those σˆ 2
k̂i = ,
from the traditional multiple-regression model βˆ i2 (6)
given by (2). This makes the new model
i = 1, 2, ..., p.
parameters somewhat biased, that is,
E ( βˆ ∗ ) ≠ β ,(whereas the parameters as The acronym HK is used for this estimator.
calculated by the OLS method are unbiased Hoerl and Kennard (1970) stated that “based on
estimators of the true parameters). However, the experience the best
method for achieving a
variances of the new parameters are smaller than
better estimator βˆ ∗ is to use kˆi = k for all i.”
that of the OLS parameters and, in fact, so much
smaller than their MSE may also be smaller than Thus, the k̂i − values of (6) can be combined to
that of the parameters of the least squares model. obtain a single value of k. Thereby it is not
This is an illustration of the fact that a biased advisable to use an ordinary average because a
estimator may outperform an unbiased estimator large k and too much bias would result. Hoerl, et
provided its variance is small enough. al. (1975) proposed a more reasonable
Perhaps the best way for choosing the averaging, namely the harmonic mean given by
ridge regression parameter (k) would be to
minimize the expected squared difference pσˆ 2
between the estimate and the parameter being k̂ HKB = , (7)
estimated, that is, the MSE. This would reveal βˆ ′βˆ
the ideal balance between increase in bias and
reduction in variance of the estimator, where where p denotes the number of parameters and
σ̂ 2 is given by
MSE = Variance + ( Bias )2 . (5)
RSS
σˆ 2 = , (8)
Therefore, it is helpful to allow a small bias in n− p
order to achieve the main criterion of keeping
the MSE small: this is precisely what ridge where RSS denotes the residual sum of squares
regression seeks to accomplish. and the acronym HKB is used for estimator (7).
Several methods for estimating k have The original definition of k provided by Horel
been proposed, for example see: Hoerl and and Kennard (1970) and Hoerl, et al. (1975) is
Kennard (1970), Hoerl, et al. (1975), McDonald used throughout this article to suggest the
and Galarneau (1975), Lawless and Wang proposed estimators as modifications of their
(1976), Hocking, et al. (1976), Wichern and estimators. It is known that the denominator
Churchill (1978), Nordberg (1982), Saleh and
(n − p + 2) yields an estimator of σ 2 with a
Kibria (1993), Singh and Tracy (1999),
Wencheko (2000), Kibria (2003), Khalaf and lower MSE than the unbiased estimator given by
Shukur (2005), Alkhamisi, et al. (2006), (8) (Rao, 1973). Thus, the use of σ̂ 2 is
Alkhamisi & Shukur (2007), Khalaf (2011) and suggested and is defined by
Khalaf, et al., (2012).
RSS
The Main Result σˆ ∗ 2 = , (9)
n− p+2
Identifying the optimal method for
choosing k is beyond the goal of this study;
Hoerl and Kennard (1970) showed that the to estimate
σ̂ 2in both (6) and (7). This leads to
the following new estimators
445
A PROPOSED RIDGE PARAMETER TO IMPROVE THE LEAST SQUARES ESTIMATOR
446
GHADBAN KHALAF
Table 1: The MSE of the Suggested Estimators, HK, HKB and the OLS Estimator (n = 20)
Table 2: The MSE of the Suggested Estimators, HK, HKB and the OLS Estimator (n = 100)
Table 3: The MSE of the Suggested Estimators, HK, HKB and the OLS Estimator (n = 1,000)
447
A PROPOSED RIDGE PARAMETER TO IMPROVE THE LEAST SQUARES ESTIMATOR
448
GHADBAN KHALAF
Nordberg, L. (1982). A Procedure for Singh, S., & Tracy, D. S. (1999). Ridge
determination of a good ridge parameter in regression using scrambled responses. Metrika,
linear regression. Communications in Statistics, b, 147-157.
A11, 285-309. Wencheko, E. (2000). Estimation of the
Rao, C. R. (1973). Linear statistical signal-to-noise in the linear regression model.
inference and its applications. New York, NY: Statistical Papers, 41, 327-343.
John Wiley and Sons. Wichern, D., & Churchill, G. (1978). A
Saleh, A. K., & Kibria, B. M. (1993). comparison of ridge estimators. Technometrics,
Performances of some new preliminary test 20, 301-311.
ridge regression estimators and their properties.
Communications in Statistics - Theory and
Methods, 22, 2747-2764.
449