Professional Documents
Culture Documents
Open Access
a
Department of Statistics, Çukurova University, 01330 Adana, Turkey
b
Department of Econometrics, Çukurova University, 01330 Adana, Turkey
Abstract: Consider the linear regression model y = X + u in the usual notation. In many applications the design matrix
X is frequently subject to severe multicollinearity. In this paper an alternative estimation methodology, maximum entropy
is given and used to estimate the parameters in a linear regression model when the basic data are ill-conditioned. We de-
scribed the generalized maximum entropy (GME) estimator, imposing sign restrictions of parameters and imposing cross
parameter restrictions for GME. Mean squared error (mse) values of the estimators are estimated by the bootstrap method.
We compared the generalized maximum entropy (GME) estimator, least squares and inequality restricted least squares
(IRLS) estimator on the widely analyzed dataset on Portland cement.
Keywords: General linear model, inequality restricted least squares estimator, least squares estimator, maximum entropy esti-
mation, multicollinearity, support points.
These data come from an experimental investigation of 3 = 28.4597 , 4 = 10.2674, 5 = 0.0349 . The condition
the heat evolved during the setting and hardening of Portland
cements of varied composition and the dependence of this number of X is now (X) = 1 / 5 = 6056.37 , which is ex-
heat on the percentages of four compounds in the clinkers ceptionally large and so X may be considered as being quite
from which the cement was produced. In this example, the “ill-conditioned”. With an intercept the associated inhomo-
dependent variable y is defined as heat evolved in calories geneous linear model certainly posses “high multicollinear-
per gram of cement. The four compounds considered by [13] ity” with possible effects on the OLS estimates
are tricalcium aluminate (X1 ) , tricalcium silicate (X2 ) , tet- ˆ 0 , ˆ1 , ˆ 2 , ˆ 3 , ˆ 4 . The ordinary least squares estimates of
and their estimated Standard errors (in parenthesis) together
racalcium alumino ferrite (X3 ) , - dicalcium silicate (X4 ) .
with the associated t-statistics with n-p =13-4=9 degrees of
The four columns of the 13x4 matrix X comprise the data on freedom and two-sided critical values (in parenthesis) are:
x1 , x2 , x3 , x4 respectively.
62.4054 (70.0710) 0.89 (0.3991)
Consider the following homogeneous linear regression
1.5511 (0.7448) 2.08 (0.0708)
model: (0.7238) , 0.70 , (0.5009)
ˆ OLS = 0.5102 ,
y = X + u = 1x1 + 2 x2 + 3 x3 + 4 x4 + u . (9)
0.1019 (0.7547) 0.14 (0.8959)
The ordinary least squares estimates of and their esti- 0.1441 (0.7091) 0.20 (0.8441)
mated standard errors (in parentheses) together with the as-
sociated t-statistics with n-p =13-4=9 degrees of freedom and We note that not one of these t-statistics is significant (at
two-sided critical values (in parentheses) are: the 5% level). Thus, the estimated regression function is:
2.1931 (0.1853) 11.84 (< 0.0001) ŷ = 62.4054 +1.551x1 + 0.5102x2 + 0.1019x3 0.1441x4 ,
= 1.1533 , (0.0479)
, 24.06 , (< 0.0001) ˆ OLS
2
=5.982955 and R 2 = 0.9824 , which implies that
OLS 0.7585 (0.1595) 4.76 (0.0010)
98.24% of the total variation has been explained by the re-
0.4863 (0.0414) 11.74 (< 0.0001) gressors.
Following [17], we may improve our estimates by adding
where OLS
2
= 5.845461 and the mŝe ( OLS ) = 0.0638 . We
to the inhomogeneous model with intercept the linear restric-
note that all of the t-statistics are highly significant. tion 3 = 2 1 or R = r = 0 where R = (0, 1, -1, 1, 0 ) . We
The condition number of X is (X) = 20.5846 so X may test the linear hypothesis Ho : R = r = 0 in the unrestricted
be considered as “well-conditioned”. We now, however,
linear model (10). The test statistic for testing the null-
following Hald ([14], pp.648-649) and Gorman and Toman
hypothesis Ho against H1 : R r = 0 , given our observa-
([15], pp. 35-36), Daniel and Wood ([16], pp: 89) and [17]
add a column of ones to the matrix X and fit an inhomogene- (Rˆ OLS r )(RS1R )1 (Rˆ OLS r)
tions, is F = =1.92.
ous linear regression model with intercept to the data. Now, ˆ 2
consider the following regression model:
Since F0.05,1,8 = 5.32 , Ho is not rejected at the 5% signifi-
y = 0 + 1x1 + 2 x2 + 3 x3 + 4 x4 + u . (10)
cance level. We have the following estimates for 2 :
It is seen that the row totals of the 13 4 matrix X (a) unrestricted inhomogeneous model: ˆ OLS
2
=5.982955
(without the column of ones) are all approximately equal (b) restricted inhomogeneous model with the restriction
to 100 (the compounds are all measured in percentages). 3 = 2 1 : ˆ c2 = 6.59199 .
We still have n=13 observations but there are now
p=5 unknown regression coefficients. The 13x5 matrix Table 1 gives summary statistics for the data. The sample
X now has singular values 1 = 211.3675 , 2 = 77.2361, coefficient of variation is defined as CV = s / x , where s is
y x1 x2 x3 x4
Prior
Parameter Parameter Support (M=5) Parameter Support (M=7)
Mean
0 (constant) z0 = {-75 -37.5 0 37.5 75} z0 = {-75 -50 -25 0 25 50 75} 0
the sample standard deviation and x is the absolute value of For 3 and M=5 v = {-46 -23 0 23 46}
the sample mean.
Table 2 gives the sample correlation matrix. For 4 and M=5 v = {-60 -30 0 30 60}
As can be seen from Table 2 the sample correlation coef- For 3 and M=7
ficients are: rx x = 0,229 , rx x = 0,824 , rx x = 0,245 v = {-46 -30.67 -15.33 0 15.33 30.67 46}
1 2 1 3 1 4
rx x = 0.139 , rx x = 0.973 and rx x = 0.030 [18] notes cor- For 4 and M=7 v = {-60 -40 -20 0 20 40 60}
2 3 2 4 3 4
relation and collinearity are not the same. It is possible to Table 4 gives point estimates for the cement data using
have data that are subject to collinearity when the correlation OLS, and GME estimators, where S3 and S4 refer to the use
between pairs of explanatory variables is low. of a 3 and 4 - rule respectively.
3.1. Prior Mean of Each Parameter Is Zero We have obtained that the results are not improved by us-
For both the coefficient and error supports, we need to ing more than 5 support points. The results show that the
select the number of points, M and J, respectively. GME estimates differ much from OLS in terms of the signs
and magnitudes of the estimates. The GME estimates for 0
GME- First consider the parameter support. We choose
M=5 support points for each parameter. First consider the and 1 are smaller than the OLS estimate, while the GME
parameter support. Here we specify wide bounds of estimates for 2 , 3 and 4 are larger in magnitude than the
75,75 (for the intercept term, [4, 4 ] for 1 and…)
OLS estimates.
2,2 for the other coefficients. The supports are symmet- Using Table 4, we have
ric about zero so prior mean of each parameter is zero. Table ˆ OLS =
ˆ OLS
ˆ GME1S3M7 ˆ GME1S3M7 = 703.119
3897.08 >
3 gives the parameter support for GME.
ˆ ˆ = 699.790
ˆ GME1S4M7 ˆ GME1S4M7 = 686.198
We also need the error support for the GME estimates. > GME1S3M5 GME1S3M5 >
We initially construct the GME estimator with error bounds
ˆ GME1S4M5ˆ GME1S4M5 = 682.648
±3 . However, since is not known we must replace it > .
with a consistent estimator. We considered two possible es- On the other hand, mean squared error (mse) values of
timates for : 1) ˆ = 2.446 from the OLS regression, 2) The the estimators are estimated by the bootstrap method. The
sample standard deviation of y is s = 15.044 . We used the idea behind the bootstrap is as follows: Let be an estimate
more conservative value s=15.044 of the sample standard of the coefficient vector , 2 denotes the estimated residual
deviation of y. Using the sample standard deviation of y, 3
and 4 - rule results are: variance and e denotes the standardized estimated residual
Generalized Maximum Entropy Estimators The Open Statistics & Probability Journal, 2011, Volume 3 17
vector, which is obtained using in (2). For the j-th boot- Golan et al. [3] show that the GME estimator has a lower
mean squared error than the OLS, Ridge and RLS estimators
strap replication, e*j , a random sample of e with replacement in several sampling experiments, particularly when the data
exhibit a high degree of collinearity.
is taken and multiplied by . Using this e*j , and the X
We now modify our parameter support to account for the
matrix; y*j is computed from the equation (2). For this boot- expected signs of the coefficients.
strap sample y*j and design matrix X, *j , j-th bootstrap esti- 3.2. Imposing Sign Restrictions of Parameters for GME
mate of is computed. This procedure is replicated for R1GME- If we expect the coefficients to have positive
10000 times to obtain the empirical sampling distribution of effect, we modify the parameter support such that the prior
. After then, the variance and the expected value of each mean of their coefficients is positive. We impose correct sign
restrictions for all the parameters. The prior mean is simply
element in is estimated with this sampling distribution.
the value the parameters are shrunk toward, not a binding
Scalar mse of is estimated using bootstrap variances and restriction. When we restrict the parameter estimates to be
= trace cov()
expected values in mse() {
}+ bias( ) bias()
. positive we obviously must specify a positive prior mean.
For the R1GME estimator we specify the parameter support
On the other hand, since the true value of is not known, we for 1 as 0 0.5 1 1.5 2 which has a prior mean 1. We
used OLS estimate of to estimate bias as
choose M=5 support points for each parameter. Table 5 gives
, where Ê()
biaŝ() = Ê() is the bootstrap estimate of the parameter support for R1GME. When we restrict the
OLS
the expected value. Estimated mse values are as follows parameter estimates to be positive we must specify a positive
prior mean.
mŝe(ˆ OLS ) = 4402.698 We again specify error supports using ±3 and ±4 .,
( )
> mŝe ˆ GME1S4M5 = 1326.741 >
3.3. Imposing Cross Parameter Restrictions for GME
( )
mŝe ˆ GME1S4M7 = 1320.869 > mŝe (ˆ GME1S3M5 ) = 1306.551 >
R2GME-We now estimate the Hald data (cement data)
mŝe (ˆGME1S3M7 ) = 1300.118 . model with sign restrictions plus the additional restriction
that 1 4 >0. This is just a made up restriction to illustrate
18 The Open Statistics & Probability Journal, 2011, Volume 3 Akdeniz et al.
Table 6 Estimates with a Single Cross-Parameter Restriction ( 0 > 0,1 > 0,2 > 0,3 > 0,4 > 0)
34.852 36.696
Constant 0 62.405 48.194
(1.961) (1.856)
1.075 1.033
x1 1 1.551 1.696
(0.044) (0.029)
0.871 0.844
x2 2 0.510 0.657
(0.032) (0.026)
0.193 0.194
x3 3 0.102 0.250
(0.002) (0.001)
0.319 0.334
x4 4 -0.144 0.000
(0.013) (0.012)
how to impose cross restrictions. We include this restriction restrictions 1 4 and 2 1 3 . Using the matrix nota-
to illustrate the use of a non-block diagonal parameter sup- tion we have:
port matrix. We specify the restricted GME support matrix
using 0
0 1 0 0 11
p z z p 0
1 = Z. 1 = 1 4 . 1 0 1 1 1 0 2 .
4 p4 0 z4 p4
0
3
where Z is the 2 2M sub matrix of support points for 1 4
and 4 , and p1 and p4 represent the unknown probabilities
To impose these restrictions, we specify the R3GME
associated with support points for tricalcium alumi-
support matrix as ( 0 > 0,1 > 0,2 > 0,3 > 0,4 > 0)
nate (X1 ) and - dicalcium silicate (X4 ) , respectively. For
this set of restrictions, we specify the parameter supports for z 0 0 0 0 p0
R2GME as follows: 0 0
1 0 z1 0 0 z4 p1
R2GME: Both ˆ1 and ˆ 4 are constrained to be positive = Zp = 2 = 0 z1 z2 z3 z4 p2
0 0 0 z3 0 p3
because of the support of 4 . The estimate for 1 , 3
4 0 0 0 0 z4 p4
ˆ1 = ˆ 4 + z1p1 ˆ 4 . The prior mean of the estimate for 1
is ˆ 4 + 0.6 The restrictions are satisfied as
We again specify error supports using ±3 and ±4 . ˆ1 = z1p1 + z4 p4 = ˆ 4 + z1p1 ˆ 4 ,
Finally, we include an example imposing multiple ine- ˆ 2 = z1p1 + z2 p2 z3 p3 + z4 p4 = (ˆ1 z4 p4 ) + ˆ1 ˆ 3
.
quality restrictions through the parameter support matrix. In
addition to the parameter sign restrictions, we impose the z p ˆ + z p = ˆ ˆ + z p
2 2 3 4 4 1 3 2 2
Generalized Maximum Entropy Estimators The Open Statistics & Probability Journal, 2011, Volume 3 19
34.744 36.786
constant 0 62.405 48.194
(2.023) (1.873)
0.966 0.956
x1 1 1.551 1.696
(0.030) (0.021)
0.883 0.850
x2 2 0.510 0.657
(0.035) (0.028)
0.192 0.194
x3 3 0.102 0.250
(0.002) (0.002)
0.331 0.340
x4 4 -0.144 0.000
(0.015) (0.013)
Bootstrap standard error estimates are given in parenthesis.
25.531 25.617
constant 0 62.405 0
(1.380) (1.116)
0.643 0.633
x1 1 1.551 1.911
(0.026) (0.018)
1.182 1.211
x2 2 0.510 1.202
(0.049) (0.045)
0.160 0.164
x3 3 0.102 0.710
(0.004) (0.003)
0.272 0.265
x4 4 -0.144 0.491
(0.011) (0.008)
parameter support matrix when there are linear inequality [7] I. Fraser, “An application of maximum entropy estimation: The
demand for meat in the United Kingdom”, App. Econ., vol. 32, pp.
restrictions and multiple restrictions. 45-59, 2000.
We have seen that estimation is not improved by choos- [8] E. Z. Shen, and J.M. Perlof, “Maximum entropy and Bayesian
approaches to the ratio problem”, J. Appl. Econ., vol. 104, pp. 289-
ing more than about five support points. Varying the width 313, 2001.
of the error bounds has a larger impact on the estimates. For [9] H. K. Mishra, “ Estimation under multicollinearity application of
the Portland cement data example, we conclude that error restricted Liu and maximum entropy estimators to the Portland ce-
bounds of ±4ˆ are preferred over error bounds of ±3ˆ . The ment dataset,” MPRA Paper 1809, University Library of Munich,
GME estimator is a shrinkage estimator where the parameter Germany, 2004.
[10] M. R. Caputo, and Q. Paris, “Comparative statics of the generalized
estimates are shrunk towards the prior mean, which is based maximum entropy estimator of the general linear model”, EJOR.,
on nonsample information. We demonstrate how to impose vol. 185 , pp. 195-203, 2008.
restrictions on the GME estimator using a simple example of [11] F. Pukelsheim, “The three sigma rule”, Am. Statistician., vol. 48,
sign restrictions and new parameter support matrices. pp. 88-91, 1994.
[12] A. Golan, and J.M. Perloff, “Comparison of maximum entropy and
higher-order entropy Estimators”, J. Econ., vol. 107, pp. 195-211,
REFERENCES 2002.
[13] H. Woods, H.H. Steinor, and H.R. Starke, “Effect of composition
[1] C. E. Shannon. “A mathematical theory of communication”, Bell.
of Portland cement on heat evolved during hardening”, Ind. Eng.
Syst. Tech. J., vol. 27, pp. 379-423, 1948. Chem. Res., vol. 24, pp. 1207-1214, 1932.
[2] E. T. Jaynes. “Information theory and statistical mechanics”, Phys.
[14] A. Hald. Statistical Theory with Engineering Applications. John
Rev., vol. 106, pp. 620-630, 1957. Wiley, New York: 1952.
[3] A. Golan, G. Judge, and D. Miller, Maximum Entropy Economet-
[15] J. W. Gorman, and R.J. Toman, “Selection of variables for fitting
rics, John Wiley: New York, 1996. equations to data”. Technometrics, vol. 8, pp. 27-51, 1966.
[4] R. C. Campbell, and R. C. Hill, “A monte carlo study of the effect
[16] C. Daniel, and F.S. Wood, Fitting Equations to Data: Computer
of design characteristics on the inequality restricted maximum en- Analysis of Multifactor Data. New York: John Wiley, 1980.
tropy estimator”, Rev. App. Econ., vol. 1, pp. 53-84, 2005.
[17] S. Kaçıranlar, S. Sakallıolu, F. Akdeniz, G.P.H. Styan, and H.J.
[5] R. C. Campbell, and R.C. Hill, “Imposing parameter inequality Werner, “A new biased estimator in linear regression and a detailed
restrictions using the principle of maximum entropy”, J. Stat. Com-
analysis of the widely-analysed data set on Portland cement” Sank-
put. Simul., vol. 76, pp. 985-1000, 2006. hya., vol. 61, Series B, 3, pp. 443-459, 1999.
[6] X. Wu, “A weighted generalized maximum entropy estimator with
[18] D. A. Belsley, Conditioning Diagnostics: Collinearity and Weak
a data-driven Weight”, Entropy, vol. 11, pp. 917-930, 2009. Data in Regression; John Wiley: New York, 1991.
.
Received: December 02, 2010 Revised: February 15, 2011 Accepted: February 27, 2011