You are on page 1of 8

HANDLING MULTICOLLINEARITY KITH SAS IML@ SOFTHARE: RIDGE REGRESSION ON THE PC

GARY WILLIAM CARR, NYNEX


GERALD TLAPA, BELLCORE
1.0 Introduction.

where (X'X) is the correlation matrix and is


non-singular.
b {X'X)-l X'Y
(5)
where (X'X)-l is the inverse of the
correlation matrix. The solution b has the
following properties:
1) It estim.tes B with minimized error sum
of squares irrespective of the distribution
function of the errors.
2) The elements of b are linear functions of
the observations Yl, Yz. "'Y n and
provide unbiased estimates of the elements
of B which have the minimum variances
irrespective of distribution functions of
the errors.
Normal use of ordinary least squares

Analyses of economic data must consider


the likely presence of multicoll inearity. If
one is interested in studyIng Gross DomestIc
Product {GOP) versus other macroeconomic
infrastructure components, multicollinearity
wIll b. anticipated since components of the
infrastructure generally march 1n harmony with

GOP. for example, many correlation studIes have


shown a strong direct relationship between GOP
and Energy. (1) In particular, this relatIonshIp has been analyzed by Janosl and Grayson
wIth a.resulting R' l 0.9 in thirty-two of
the thIrty-four cases studied. These results
were obtained by using a log-log model assuming
constant rates of continuous growth. (2).
As the number of variables increases
within an econometric model, the concern with
multicollInearity increases since the
probabilIty that some variables measure similar
phenomena increases. Presence of multicollinearity violates the assumption that the

regreSSion assumes that the input variables are

uncorrelated. In addition, the researcher wants


as many observations as possible per variable to
insure that a purely random component will be
less likely to affect inferences about the
deterministic portion of the equation. Economic
data, however, are often sparse. especially with
regard to developing countries and may, to a
significant degree, measure the same basic
phenomena. Economic data therefore, almost
always displays multicollinearity. This Is a
common problem that must be addressed when
modeling economic effects. Minimizing
'multicollinearity maximizes the explanatory
power of any model chosen to describe causual

explanatory variables in a regression model are

not strongly interrelated. The assumption is


one of the three fundamental assumptions of
regression analysis.

Consequence of violating

this assumption leads to low precision of


individual regression coefficient estimates

which 1n turn can lead to erroneous inferen~es.

One specific focus of this paper


addresses multicollinearity using the ridge
regression technique.

Ridge regression

provid~s

'economlC relationShips.

a statistically robust method for overcoming

Multicollinearity can be defined as a


property of the correlation matrices where the
off diagonals (Independent regressions) approach
L (3). When significant multicollinearity
exists, it is impossible to determine the
importance of each independent (regressor)
variable in explaining a dependent varlabl.
based on R'. (4). Consider the variables of
energy~ telecommunications. domestic
investment, airline travel and savings that

data problems frequently encountered in

econometric modeling using Ordinary Least


Squares methodologies.
2.0 Analysis.

Assume that the relationships of any


given country's infrastructure components
relative to its GDP are sought such that they
can be expressed in terms of a general linear
regress ion mode 1:

Y.XB
e
(1)
where Y is a vector of observations (Ln GDP in
subsequent analyses of this study), X is a

produce a correlation matrix,

CORRELATION MATRIX (X'X) FOR CHINA (1961-1985)

matrix of lnfrastructure variable observations

(infrastructure components), B is a vector of

parameters. and e is a vector of errors

normally distributed with expected value of


E(e). 0, Var f{e}- a'. In this case
tha elements of variance are uncorrelated.

and

Since E(e).O

[(y). E(XB)

(2)

Ln(ENG)
Ln(TEU
Ln(AIR)
Ln<INV)
Ln(POP)
Ln(GOP)

e'e (Y - Xb)'(Y-XB)
(3)
where b is the LSE of Band ( )' indicates the

The least squares estimate (LSE)


of B is the value b which minimizes the error
e 'e

This provides the

normalized equation:
(X'X)b X'Y

Ln(ENG) Ln(TELl Ln(AIR) Ln<INV) Ln(SI\V) Ln(GLlP)


1.0000
0.9183
0.8786
0.9131
0.9110
0.8911

0.9183 0.8786 0.9131


1.0000 0.9257 0.9313
0.9257 1.0000 0.9618
0.9313 0.9618 1.0000
0.9349 0.9649 0.9992
0.9259 0.9902 0.9813

0.9110
0.9349
0.9649
0.9992
1.0000
0.9844

0.8911
0.9259
0.9902
0.9813
0.9844
1.0000

ENG - level of Energy consumption expressed in


million tons of coal equivalent. (UN).
TEL. Number of telephones installed. (AT&T).
AIR. Number of ajrllne passengers carried. (UN)
INV Oomestic investment. (World Bank).
SAV. Domestic savings. (World Bank).
GOP. Gross domestic investment. (IMF).

matrix transpose.

sum of squares.

(XIX>. below.

(4)

1220

To oVercome problems with data quality


aAd improve the reliability of data analyses and
forecasts, econometricians may (intentionally)
introduce bias into their models. This serves
the purpose of reducing standard errors and

Each "independent" infrastructure


variable shows a high correlation to each other
as well as to Ln(GOP). Alternatively,
multicollinearity can be observed in the inverse
of this correlation matrix, (X'X)-l measured
by the high diagonal values above 1.

inverse correlation matrix below).

mu1tlcoilinearity. Indirect introduction of


bias into a model occurs when one of the

(See

While

independent variables should ideally not be

variables which is collinear with another


variable is dropped. By reducing the
information input into a model, collinearity is
reduced. Another procedure that can be used

correlated with each other, economic data on


1nfrastr~cture components wtll show some degree

of correlation to each other.

Colline.rity

lntrodu~es bias directly.


One such robust
procedure is ridge regression and is becoming

among the independent variables must therefore

be kept at acceptable levels in the regression


model. The diagonal elements of the inverse
matrix are called Variance Inflation Factors

increasingly popular In econometric analyses.


Hoerl and Kennard (1970) are cited frequently
for their use of this technique. The guiding
principles which they have followed are listed
below. (8l.:

(Vln.

VIFi - l/O-R' I)

(6)

R'I is the coefficient of


determination of the i-th independent variable
regressed on all other independent variables.
(5). As an example, using the China data as the
independent variable Ln(SAV) regressed on all
other independent variables, one obtains a
R' value of .9987. When this R 2 value is
Introduced into equation 6 above, the VIF,
778.084. When the VIFi value exceeds the
value of 10 (as identified by Freund and Llttel)
(obtained by substituting the R-squared value of
the total regression results into the VIF
formula) the presence of unacceptable multicollinearity is identified. (6). The analyst must
now seiect which VIr element value is most
ciosely related and not independent from the
other independent variables. Correction of the
data Is now required to eliminate the presence

1)

elements of the correlation matrix. the

2)

3)
4)

represent.

The proper sign will be assigned to all


coefficients.
The residual sum of squares wi 11 not be
I ofl ated to an
unreasonable value.

The amount of

The ridge regression procedure Is


intended to oVercome multicollinearity problems
where the correlation matrlx is nearly 1.0,

Consider once again the,

giving rise to unstable parameter estimate,.


Additional work on this method by G.M. Mullett
has explained why an incorrectly signed
coefficient becomes corrected. (9).
G. Jellsavclc justifies this technique by

INVERSE CORRELATION MATRIX (X'X)-l fOR CHINA


<l961 - 1985)
Ln(ENG) In(TEL) Ln(AIR) Ln(INV) Ln(SAV)
Ln(ENG) 8.225 -5.036 1.035 -21.951 18.i50
tn(TEt) -5.036 11.768 -3.227 24.936 -28.2\6
Ln(AIR) 1.035 -3.227 i16.113 15.484 -~8.945
Ln(INV) -21.951 24.936 15.484 722.240 -7>9.911
Ln(POP) 18.150 -28.2i6 -28.945 -739.911 778.084

demonstrating the lower mean squared error which

it produces and its ability to choose the proper


bias, k. required for the variables being
analyzed. (10). The work of Chatterjee and
Price support this. technique by showing how It

mln1mizes the mean squared error when a


regression equation 1s used to predict future

Several remedies are frequently suggested

values (11). Draper .and Smith confirm the use


of ridge regression when prior knowledge of the

The general

approach Is usually to collect more data.

This

parameters is known <lower coeffieient values or


the sign of a coefficient is incorrect)~ and
also, when ridge regression is subject to
restrictions on the parameters (a least square

answer rarely helps the econometrician. who

typically has short data series and cannot wait


for additional data to be obtained. In
addition, the cost of obtaining additional data

problem with the addition of restricted or


constraints on external information). (12).
T. H. Wonnacott demonstrates that in using ridge

may be prOhibitive and cannot guarantee a

reduced collinearity sample. (7).

resulting coefficients will


stabilize and have characteristics
simi lar to an
orthogonal system.
The actual coefficients will have
reasonable absolute
values respective to the factors they

variance will not be large relative to


the process generating the data.

of collinearity.

to correct poorly conditioned data.

As bias, k, is added to the diagonal

Another

common ap~roach is to reduce the number of


independent variables in the model if similar
phenomena are being measured by many independent

regression t~ lvoid multicollinearity. the


confidence intervals of relevant regressor
variables are more precise. ~13}.

variables. This may not be fe.sible if one is


trying to determine the Importance of several
variables Influencing one dependent variable. A
procedure more robust than ordinary least
squares (OLS) regression is appropriate In this
instance.

Ridge regression is not a panacea for


ail economlc problems but in many instances it

has led to improved understanding of available


data.

1221

The ridge regression methodology


demonstrated here will add bias, k, to the trace
.lements of the correlation matrix. If any
trace element of the tnverse corr~lation matrix
is l.ss than one, no bias Is added. This
procedure is repeated until all trace values are
equal to one. This insures that the
collin.arityof .ach variable Is treated
separately and only those variables
demonstrating collinearity will receive bias.
In addition, the relationships of the variables

CHINA (GOP PER CAPITA)


1960
87.22

thereby avoiding distortion of the original


hypothesis. Anl/ther value of Importance to be
calculated is P , (P star), which determines
the correct amount of bias for establishing the
best regression equation. (14).

TR*

1980

109.66 275.56

1985
308.73

USing ridge regreSSion a different set


of coefficients 1$ obtained. 'Adding sectors of
an economy while not exceeding the dependent
variable, usually results in positive
coefficients. COntributions made by one sector
upon another should be removed to Insure the
ceteris paribus requirement thereby representing
the Independent contributions to the dependent
variable. ImpOSing these conditions results in
the following:

are ma i nta i ned to assure correct tlba,l ance"

p* ~trl {[(X'X+kD-1 (X'X)-U' [(X' X+kI>-'(X'X)_I])


Since, I Is a p * p matrix where p is
the number of independent, vari ~b 1es in the
model, the trl = p. Clearly P .s. .p, which
sUQgest that the violation of assumption about
interrelationship of the Independent variables
In the model has not been ignored, but rather
accounted for. Quantity p* could therefore be
thoutht of as "effective" number of Independent
variables in the model.
Another important quantity to be cal~u
lated is trace of 'nverted design matrIx, TR :

1910

0)

RIDGE REGRESSION RE5UL T5


CHINA (1961 to 1985)
Ln(ENG) Ln(TEL) Ln(AIR) Ln(INV) Ln(SAV)

BO

------ ------- -------- ------- ------- ------5.5153 0.1964

T for HO

~3.265)

Probabill ty> IT1

0.0635

0.1445 0.1764 0.1711

(4.873) (11.72) (12.88) (13.90)

(.00407) (.00001) (.0001) (.00001)(.00001)

The above results show a level of


significance below .004 In all Independent
varl.ables. This measures the probability that a
ITI statistic would obtain a value greater than
the observed given that the true parameter Is
zero. The probability of the T-statistlc for
energy in China Is 0. 00401. This means that if
we reject the null hypotheSiS (8 = 0), there is
a 0.41 probability that the null hypotheSis is
actua 11 y true.

tr[(X'X + KIl-1 (X'X)(X'X + KIl-1] (8)


Both p* and TR* are calculated for a
whole range of parameter k; starting with K = Q.
Optional value for k is the value at which the
following equality is reached, or approached as
close as feasible:
TR* ~ P*
(9)
SAS IML software program was used
to compute results following the above
procedures. This program appears in Appendix A
to Uis paper. (15).
=

The amount of rIdge bias necessary to


achieve the proper regreSSion result was within
"rule of thumb" parameters. (16).

The maximum

amount of bias added to any element was as


follows:
BIAS REQUIRED FOR RIDGE REGRESSION
CHINA 1961-1985

3.0 Results
Let us now apply ridge regression
t~chniques to the China example of Section 2.
first consider the ordinary least squares
results operating upon the .dependent variable of
Ln{GOP):

.275

4.0 Conclusions:

Ridge Regression bias intentionally


introduced can be kept at low levels and only
added to those elements demonstrating
col1inearity. The mean square error term is
kept at low levels, insuring improved results.

ORDINARY LEAST SQUARE RESULTS


CHINA (1961 to 1985)

Ridge regr.esslon corrects lmproper OlS stgns

60 Ln(ENG) Ln(TEL) (n(AIR) Ln(INV) In(SAV)

----------- ------- ------- ------- ------5.715 0.0388 -0.0475 0.3047 -0.5911 0.9865

inflated parameter estimates and unstable


coefficients.

UsIng the above coefficients for each


infrastructure component generates erroneous
results. for a growing economy the signs of the
coeffl,cients .should be positive. A negative
coeffiCient value Is e'pected only if a segment
of a' economy is flat or declines (negative
slope). Also, the large coefficients are
miSleading as indicators of the contribution
being made to the dependent variable. China's
economy shows Increasing growth based on the
gross domestiC product per capita:

ThiS study dealt with five variables


which show varying degrees of multicollln~ ,
oarlty. Ridge regression methodology was
employed to deal with multIcollinearity 1n
deter~lning model coefficients. One must,
however. use caution when maklng general
statements concerning these regresslon results
beyond the variables discussed. Additional work
should be undertaken with an expanded economic
model of Infrastructure components to develop a
better understanding of each country.

1222

'

Using SAS IML@ on a personal computer


to perform ridge regression of economic data

upward as variables are added to the model.


In Appendix B. the SAS output window
dlsplayof the CHINA data is includ.ed. It is
Important to observe the signs of the variables

streams provides Increased flexibility to the


analyst.
Care should be given to some of the PC's
limitations using SAS IML,

and the statls.tical measure of "press,1l

SAS version 6.03

and "MSE."

requires more memory than earlier versions and


640K bytes should be considered the minimum
requirement. The amount of variables one uses
should also be kept at a minimum and variables
only added that are important to the economic
model. Please note that in the attached program
listing IML worksize will need to be adjusted

Appen<lix C includes

t~e

~ESSu

lQg window

results. With each computational loop the


effects of bias. k. on the diagonal elements can
be observed by the ","lyst. No equation should
be considered the best until all statistical
parameters are revlewed.

Intermediate knowledge

of ridge regression procedural results and


underlying data is always useful.

NOTES and REFERENCES

SAS/lMLe software is the registered


trademarK of SA, Instutute Inc . Cary. NC.
USA.
1. deJanosl. Peter E. and Grayson. Leslie E.,

Nonorthogonal PrOblems", Technometrics. No.

lZ. 1970. pp. 69-82.


9. Mullett. G. W.. "Why Regression
COefficients Have the Wrong Sign". Journal
of Quality Technology. No.8, 1976. pp.
11Z-126.

Upatterns of Energy Consumption and

Economic Growth and Structure'\,

2.

3.
4.
5.

Journal of

Development Studies. Vol. 8. 1972. pp.


241-249.
See also:
Toldaro. Michael P.
In the Third World.
1985. pp.540-S42.
Dow 1I Edward T. MathematiCS for
Schaum's Outline Series,
Book COmpany. 1980.
Draper. N.R . and Smith. H. Applied
Regression Analysis. John Wiley & Sons.
New York. 1981. pp. 294-379.
Cassidy. Henry J . Using Econometrics: A
Beginners Guide. Reston Publishing Co.
Inc . New York. 1981. pp. ~60-168.
Belsley. David A. Kuh. Edwln. and Welsch.
Roy E. Regression Diagnostics: Identifying
Influential Oata and Sources of
COlllnearity. John Wiley &Sons. New York,

10. Jelisavcic, Gordana, liMe an Square Error as

a Reliability Measure for Biased


Estimators". Paper delivered at ASA
Meeting. August 16-19, 1982. Cincinnati.
Ohio.
11. Chatterjee. Samprit, and Price. Bertram.
Regression Analysis by Example. John Wiley
& Sons. New York. 1977. pp.143-214.
12. Draper, op. cit.
13. Wonnacott. Thomas H. and Wonnacott. Ronald
J.~

Regression: A Second Course 1n

Statistics.
1981.

John Wiley &Sons. New YorK.

pp. 64-448.

14. Personal papers from Gordana Jellsavcic.


Ph.C. who has done extensive work on ridge
regression and published several papers and

delivered speeches to America Statistical


Association.
15. SAS IML@ is a computer software product
1980. pp. 192-229.
by SAS institute, Box 8000. Cary N.C. The
6. Freund, Rudolph J. Ph.D . and Llttel. Ramon
original ridge program appeared in course
C Ph.D . SAS System for Regression. SAS
notes of Principals of Regression Analysis
Institute Inc. Cary. NC. 1986.
and has been modified to include Gordana
7. Belsley. op. cit.
Jelisavcic. Ph.D. notes.
8. Hoerl. A. E., and Kennard. R. W "Ridge
16. Ridge bias should not exceed the range of 0
Regression: Biased Estimation for
to .30.
APPENDIX A
SAS IML@ VERSION 6.03
RIDGE REGRESSION COMPUTER PRDGRAM
fOR THE PERSONAL COMPUTER

lGNS:LOG(GNS) ;
LTEL:LOGITEL) ;

OPTIONS HODATE;
DATA COUNTRY;

INFIlE 'D:CHN.PRK';
LEKGTH VAl AIR ENG GIll GOP GjlS TEL 8;
INPUT YAR AIR ENG GDI GOP GNS TEL;
LAIR:LOG(AIR) ;
LEtiG;:lOG(ENG) ;
LGDI:LOG(GDI);
LGIlP:lOG(GDP) ;

TITLEl 'CHlNA DATA (5 VARABLES)';


/, THE MACRO VARIABLE VARllST CONTAINS ,/
/, THE REGRESSOR VARIABLES ,/
%LH VARLI$T:
LAIR LENG LGOI LGNS LTEL ;
/, THE HACRO VARIABLE DEPVAR CONTAINS THE ,/
/, DEPENDNT VARIABLE ,/

1223

%lET DEPVAR:LGIIP;

STARTT:O; /' START Of BIAS 10 BE SET '/


ENDD:.50; I' END Of BIAS LOOP TO BE SET '/
INCREMNT:.025; /, INCREMENT OF BIAS TO BE SET 'I
00 K: STARlT TO ENDD BY INCRMN1:
/' RMAT:K#I(P); '/
If K:O THEN RHAT:K#I(P);
ELSE RMAT :Zt;
ZB:VECDIAG(RMAT);

I' THE MACRO VARIABLE DATA SET CONTAINS THE NAME 'I
I' Of THE SAS DATA SET 'I
%lET OATA$ET;COUNTRY;

I' THE MACRO VARIABLE OUTDSN CONSTAINS THE ,/


l' OUTPUT DATA SEf '/
~LET OUTDSN:RIDGE:

/, lHE MACRO VARIABLE COEf OONTAINS THE 'f


/' lABEtS fOR COEffiCIENTS ,/
~ET COEf:
'80' 'LAIR' 'LENG' 'LGIII' 'LGNS'

XCSPXCSK:XCSPXC~T;

MATINV:INV(XCSPXCSK); /' CORR MATRIX + BIAS INVERIED 'I


MATCOR:HATINV'XCSPXCS;I' VALUE Of H(l) 'I
IOLSHAT:MAICOR-I(P); /' VALUES Of HH(Z) 'I
TRMAT:IOlSHAT'IDLSHAT; I' VALUE Of HM(Z)'HH(l) 'I
TROfHAT:OIAG(TRMAT): /' OIAG Of MATRIX TR ,/
GARY:VECDIAG(TROfHAT); /, COLUMN VECTOR Of DIAG TR VALUES~I
IDLSTRMA:I(P)-TROfMAT: /, VALUE Of P MATRIX fORM '/
.STRIG:VECDIAG(IOLSTRMA); I' COLUMN VECTOR OF DIAG PVALUES 'I
VIfHZHZK:MATINV'HATCOR;
MlHlKYlf:VECOIAG(VlfMlHlK); I' Vlf (TR) VALUES '/
RVALUE:MZMlKYlf/STRIG; /' RVALUE HUST BE KEPT AT UPPER
BOUND )1 'I
TR=TRACE(VIfHZMlK);
PSTAR:TRACE(IDlSTRMA):
R:TRIP;
KBIAS:VECDIAG(RMAT);

'LlEL' ;
/' REQUIRED fOR CHARACTER MATRIX Of IIlDEL '/
~LET LABL:CC;
/' lHE MACRO VARIABLE BBHAMES CONTAINS THE ,/
I' NAMES Of THE COEffICIENTS AND USEfUL STATISTICS ,/
%LET BBHAMES:'PRESSSS' 'ESS' 'HSE' 'CK' 'K' 'IRHK'
'BO' 'LAIR' 'LENG' 'LGOI' 'LGNS' 'lTEL' ;
I' USED 10 NAME CHARACTER MATRIX Of MODEL ,/
%LE1 CCNAMES:'LABEL';
PRot PRINT DATA:&DATASET;
I' PROC REG OATA:&DATASET; ,/
1* tIlDEL WEPVAR:&VARlISTlVIf; *1
PROC IHl WOOKSllE:70;
START RIDGE;
N:NROI/(X); I' IIJMBER Of INPUTS PER VARIABLE 'I
P:NOOL(X): I' IIJMBER Of DEPENDENT VARIABLES '/
JX::J(N,l,1)/IX;
" COHPUTE ANaYA ITIMATE Of SIGHA2 'I
SIGMA2:(Y-JX'INV(JX/'JX)'JX/'Y)1
'(Y-JX'INV(JX/'JX)'JX/'Y)/(N-P-l);

If AfIY(RVALUE>I) THEN lABH:"0VER";


ELSE lABEl:"UtIlER';
LABL:LABEL;
PRINT RVALUE K6lAS PSTAR TR RLABEL ;

I' THIS DUE LOOP IS REQUIRED TO INCREASE THE KBIAS ON ONLY

XM:X[: ,j:
YH:y[: ,J;

THOSE ElEMENTS \/HIeH THE VIARANCE INfLATION fACTOR

(RVALUE) IS ABOVE I THEN ADOS EQUAL BIAS TO ALL ELEMENTS


AfTER (RVALUE) LESS THAN J ,/

X(:(X-REPEAT(XM,N,I)); I' XIS CENTERED 'I


YC:Y-YM; I' Y'S CENTERED ,/
YCPYC:YC/'YC;
SSXC:XC[##,l;
STDXC:SQRT( SSXU);
XCS:XC'DIAG(I/STDXC); I' X'I ARE CENTERED AND SCALED 'I
XCSPXCS=XCS/'XCS; /' CORRELATION MATRIX 'I
XCSf!YC:XCS/'YC;
lABl:J(I,I);
/' USED TO MAKE NUMERIC MATRIX 'I
LABL:CHAR(LABL); /' USED TO CHANGE NUMERIC TO CHARACTER
MATRIX 'I
ZA:J(P,l,O);
/' USED TO MAKE IIJHERIC MAUlX 'I

DO 1:1 TO PBY 1:

If RVALUE[I,lj>1 THEN ZA[I,lj:K+INCREMNT;


ELSE ZA[I, 1]:lB[I, lj;
If ALt(RVALUE<I) THEN DO 1:1 TO P BY 1; I' 00 LOOP fOR
RVALUE 'I
ZA[I,l]:K+INCREHNT;
/' ALL ONDER 1 */
END;
z(:DIAG(ZA);
RMAT;lC;
END;

1224

SBETA:MATINV'XCSPY(;
/, COI\PUTE UNSTANOARDIZED REGRESSION COEFfICIENTS 'I
SXC:l/STDXC;
BETA:$XC#SBTA;
BO:BETW'XIII;
INTERCPT:YH-80;
BETA:INTERCPT/IBETA;
I' PREPARE FOR COMPUTING APRESS LIKE STATISTIC 'I
I' THIS STATISTIC PR(RIDGE) IS EASIER 'I
I' TO COIlPUTE THAN PRESS 'I
RESID:Y-JX'BETA;
ESS:RESID[IfI,l;
HSE:ESS/(N-P-l);
I' IlAT MATRIX USING K'I
HK:XCS'INV (X(SPXCSK)
TRHK:TRACE(HK);
I'COIlPUTE CP TYPE STATISTIC 'I
CK:ESS/SIGMA2-N+2+2'TRHK;
PRESSRES:RESID/(I-VECDIAG{HK)-I/N);
PRESSSS:PRESSRES(IfI,l ;
BB:BBII(PRESSSSIIESSIJHSEIICKlIKl/TRHKl/BETA/);
CC:CC//(LABL); I' (REATES AVERHCAL CHARACTER MAnIX OF
lABEL 'I

FlMISH;
START ANOVA;
USE &OUTOSN VARIK PRESSSS M5E E5S CK TRHK SO &VARLIST/;
ROO POINT EE;
PRINT ,K PRESSSS MSE ES5 CK TRHK BO, &VARLIST;
XCSPXCSK:XCSPXCStK#I(P);
VARCOV:INV(XCSPXCSK)'(XCS/'X(S)'INV(XCSPX(SK);
VIf:VECDIAG(VARCOV);
5E:SQRT(HSE#VlF) ;
SBETA:INV(XCSPXC5K)'XCSPYC;
T:SSETA/SE;
PVALUE:(I-PROBT(A8S{T),N-P-l))'2;
COEF:/&COEFII;COEF:COEF[2:P+',l;
PRINT ,COEF T PVALUE VIF;
FINISH;
/' MAIN PROGRAM 'I
USE &IlATASH ;
READ All VARI&VARlISTI INTO X;
RAD ALL VAR/&OEPVARf IHTO V;
RUN RIDGE; I' RUN RIDGE PROGRAM 'I
I' SORT BB BV PRESSSS 'I

'xesl ;

/, TBB:8B;
BB[RANK(8B[, 1]), l=TBB;
FREE TSB; 'I
CREATE &DUTDSN FROM BB[COlNAHE:/&BBNAHESf];
APPEND fROM BB; /, CREATE OUTPUT DATA SET 'f
CREATE CFROM CC[COLNAME=I&CCNAHES/];
APPiND FROH CC; I' CREATE CHARACTER OUTPUT DATA SET 'I
RUN ANOVA; "CREATE NEW T STATISTICS ANtI VIFS 'I
QUIT;
DATA 0; MERGE &DUTDSN C;
PRO( PRINT DATA..-D;
/' PRO( PLOT DATA:O;
PLOT (&VARlIST)'K; 'I

END;

I' BUILD ANOVA TABLE 8ASED ON SMALLEST PRESS 'I


A:NROW(CC); J' READS THE AMOUNT Of COMPUTATIONS IN lABL 'I
Ff:O;
/' USED AS STARTING POINT CALULATIONS 'I
CC!1,1]:"O.l.S.";
I' CHANGES LABEL TO ROO OlS 'f
DO 1:1 TO ABY 1;
/' THIS DO lOOP IS REQUIRED 'I
If CC[I,l]:"UNDER" THEN FF:l+fF; I' SHOW THE POINT WHERE '1
If fF: 1 THEN CW-l, 1]:"BEST";f' HIllTI-COllNEARITY IS OUT 'I
ENO;
I' BASED ON R(VAlUE) 'f
DO 1:1 TO ABY 1;
I'DO lOOP REQUIRED TO READ 'I
If CC[I,1]="BEST" THEN U:I; I' THE BEST DATA SET
'I

END;

RUN;

1225

APPENDIX B
SAS OUTPUT
ORIGINAL DATA SET and RIDGE REGRESSION RESULTS

CHINA INPUT DATA (5 VARABlES)


OBS YAR
1 1961
2 1962
3 1963
4 1964
5 1965
6 1966
71967
8 1968
9 1969
101970
11 J971
12 1972
13 1973
141974
15 1975
16 1976
17 1977
18 1978
19 1979
20 1980
21 1981
22 1982
23 1983
24 1984
251985

AIR

ENG

116.2 265.1
136.4 264.2
146.5 285.3
153.2 306.0
176.4 316.8
188.9 349.0
201.0245.5
242.0 325.0
320.0 350.2
398.0347.7
476.0 390.3
554.0408.5
632.0 445.2
710.0477.0
1000.0501.2
1050.0 526.4
1140.0 602.4
1540.0 669.9
2519.0 688.7
2568.0 562.8
3236.0 556.6
3942.0 464.8
3836.0 498.5
5000.0 550.2
7300.0587.7

9074.7 47136.2
4886.7 43716.0
8286.6 47306.8
11780.0 55146.6
16150.8 65586.2
20590.6 75054.8
14010.1 70290.0
13392.6 66873.8
15622.7 76415.6
26561.9 91006.6
29125.0 98119.3
28879.7 103943.5
35890.0 120729.4
39196.5 132154.6
46669.8150201.4
42805.1 146834.7
48369.0 162860.9
65550.4 191617.4
74523.2229102.9
81570.6272007.6
79131.5289224.2
81190.9290673.9
84842.2294417.4
93468.2316321.8
124025.2322714.5

9281.8 244.034.75535.5801 9.113210.760B 9.13585.4973


5447.2 244.034.91565.5767 8.4943 10.6855 8.60295.4973
8891.9 244.034.98705.6535 9.022410.7644 9.0929 5.4973
12336.5 244.03 5.0317 5.7236 9.3742 10.9178 9.4203 5.4973
16487.9 244.035.17285.7583 9.689711.0911 9.71045.4973
20793.7 244.035.24125.8551 9.9326 11.2260 9.94245.4973
14257.9 244.035.30335.5033 9.547511.1604 9.5651 5.4973
13685.1 244.035.48895.7838 9.502511.1106 9.5241 5.4973
16179.2 742.025.76835.8585 9:6565 11.2439 9.69156.6094
26590.31240.025.98655.851310.187211.418710.18837.1229
29856.2 1738.01 6.1654 5.9669 10.2794 11.4939 10.3041 7.4605
29720.92236.01 6.31726.0125 10.2709 11.5516 10.2996 7.7125
36485.82734.01 6.44896.0985 10.4882 11.7013 10.50477.9135
38412.1 3232.006.56536.167510.576311.791710.5561 8.0809
46396.1 3412.006.90786.217010.750911.919710.74508.1351
43096.63629.006.95666.2661 10.6644 11.8971 10.6712 8.1967
48724.23833.007.03886.4009 10.7866 12.0007 10.7939 8.2514
64401.44059.007.33956.5071 11.0906 12.1633 11.0729 8.3087
71933.1 4220.007.83166.534811.218912.341911.18358.3476
77960.74432.007.85096.3329 11.3092 12.513611.26408.3966
79383.84634.008.0821 6.321911.278912.575011.28208.4412
86901.94907.008.27946.141611.304612.580011.37258.4984
88118.25161.008.25226.211611.348512.592811.39328.5489
95213.55536.00 8.51726.310311.445412.664511.46398.6190
109304.06134.008.8956 6.3762 11.7282 12.6845 11.6019 8.7216

CHINA RIDGE REGRESSION RESULTS (5 VARA8lES)


OBS PRES55S
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

E55

0.204280.06315 0.003355 6.0000


0.141820.091080.00479410.6603
0.159980.112200.005905 16.0584
0.181490.133350.00701821.7777
0.202990.153380.00807327.3281
0.223560.172100.00905832.5878
0.243020.189590.00997837.5478
0.261430.206010.01084342.2363
0.278940.221540.01166046.6922
0.289030.230440.01212849.2265
0.302560.242630.01277052.7329
0.324190.261410.01375858.2034
0.318200.257110.01353256.8664
0.357880.290640.01529766.7198
0.372710.303470.01597270.4650
0.387390.316130.01663874.1689
0.402000.32868 0.Q17299 77.8448
0.416560.341160.01795681.5040
0.431110.353590.01861085.1558
0.445680.366010.01926488.8081

0.000
0.025
0.050
0.075
0.100
0.125
0.150
0.175
0.200
0.225
0.250
0.275
0.300
0.325
0.350
0.375
0.40Q
0.425
0.450
0.415

5.0000 5.7150 0.3047


3.25756.11860.2597
2.8105 5.95550.2281
2.51855.83650.2077
2.30875.74870.1931
2.14905.68330.1820
2.02265.63450.1732
1.91965.59850.1660
1.83375.57240.1600
1.77535.49860.1498
1.71135.47500.1452
1.64895.51530.1445
1.62095.52550.1456
1.55125.53380.1399
1.51235.53760.1371
1.4771 5.54400.1345
1.4451 5.55240.1321
1.41585.5627 0.1300
1.38875.57450.1280
1.36375.58750.1261

1226

0.0388
-0.0019
0.0307
0.0627
0.0908
0.1149
0.1355
0.1531
0.1682
0.1675
0.1788
0.1964
0.1904
0.2186
0.2252
0.2312
0.2364
0.2411
0.2452
0.2489

-0.5911 0.9865
0.16140.2130
0.1741 0.2046
0.17660.1996
0.17630.1954
0.1750 0.1916
0.17330.1882
0.17150.1852
0.16970.1824
0.17670.1899
0.15570.2083
0.1764 0.1711
0.17840.1729
0.16150.1714
0.1601 0.1696
0.15870.1679
0.15740.1663
0.1561 0.1648
0.15490.1633
0.15370.1619

-0.0415 O.l.S
0.0026 OVER
0.0207 OVER
0.0324 OVER
0.0407 OVER
0.0469 OVER
0.0517 OVER
0.0555 OVER
0.0586 OVER
0.0518 OVER
0.0596 OVER
0.0635 BEST
0.0607 UNDER
0.0676 UNDER
0.0687 UNDER
0.0696 UNDER
0.0703 UNDER
0.0710 UNDER
0.0715 UNDER
0.0720 UNDER

APPENDIX C
SAS LOG OUTPUT

1. THE LOG OUTPUT SHOWS EACH lIME BIAS

(k)

IS ADDEO.

2. ANOVA RESULTS OF BEST RIDGE REGRESSION EQUATION AT THE END


RVALUE
16.109266
8.2.208294
717.77584

KBIAS
0

RVALUE
1.0256411
1. 1561 t14

0.489377

KBIAS

7.2631094

1.2618844
PSTAR

TR

J.1364017

R lABEL

RVAlUE

0,025 3.8328391 23.013123 4.6027446 OVER

4.9220496
5.7428669
5.5193258
6.287531

1).025

RVALIJ[

KSIAS

0.9643583

0.02S

1.0&432:10

0.8907362
0.5604755

0.025
0.025

4.6614848

PSTAR

TR

0.05 3.5128333 14.046961


0.05
9.05
9.05
0.05
KBIAS
PSTAR
TR
0.0753.24925329.7431704
0.015
0.075
0.075
0.075
K8rAS
PSTAR
TO
0.1 3.0330t28 7.2432591
0.1
O
0.1
0.1
KBIAS
P$TAR
TR
0.125 2.853772i 5.6380188
0.125
0.125
0.125
0.125

3.6615162
3.4635390
3.2691576
4.3891981
RVAlJ.I(

3.4032339
2.9111168
2.4819302
2.3331589
3.3408918
RVALUE
2.65808%
2.4204411
1.9182552
1.8016389
2.6771872
RVALUE
2.1682.136
2.0640819
1.5523229
1.4587214

2.2205525
RVALUE
1.8221115
1.1956H)9
1.2967520
1.2191496
1.8881955
I(VALUE
1.5665515

1.$860831
1.1091566
1.(1446158

1.5362293
RVAlUE
1.3695912
1.4181031
0.9663416
O.91l4124
~ .4392048
RVAllJ
1.162410l
1.2.723201
1.0126116

R lABEl

11.763261
RVAUJE

1.2724013

IT<

5 1527.2101 305.44202 OVER

773.34088

O.%7016~

PSTAR

1.0411158

R lABEL

RVAlUE
0.9700943
0.9742094
0.903021)1
0.5717253

2.8093923 OVER

0.9344435

R WEt

RVAlUE
0.8239312
-0.9146512
0.5806489
0.5519199
0.8181866
RVAlUE
0.7610498
0.8522119
0.5375327
0.5117989
0.8120679
RVAlUE
0.7066914
0.7973455
0.5005516
0.4113343
0.1546672
RVAlUE
0.6593105
0.7-487620
0.4685662
0.4475075
0.10444-43

1.9486341 OVER

R LA8EL

1.4486518 OVER

R l.AJlL

1.1216038 OVER

KBlAS
PSTAR
TR
R LABEL
0.152.10318654.5370252 0.907405 OVER
0.15
0.15
0.15
0.15
KBIAS
P$TAR.
TO
R l.AJlL
0.1752.57501453.7451394 Q.1490279 OVER
0.115
0.175
0.115
0.175
KBrAS
PSTAR
TR
RLASEl
0.2 2.4646275
3.1S459 0.630918 OVER

RVAlt,lF:

0.6177-&41
0.7055347
0.4406661
0.4214674
12.660195
RVAtUE
0.5809254
0.6668068
0.4161595
0.1986243

0.2
0.2

0.2
0.2
KalAS

0.6209646

PSTAR
TR
R LABEL
0.225 2.3810757 2.7873265 0.5574653 OVER
0.225

RVAlUE
0.5482199
0.6319599
0.3944935
0.3784036
0.5859872

6.2
6.2
0.225

PRESSSS

MSE

ESS.

CI(

KBIAS

TR

PSTAR

0.25 2.3011045 2.4224651


0.25
0.25
0.2
0.25

R LABEl
0.484493 OVER

KBIAS
PSTAR
TR
R lABEL
0.275 2.2171943 2.G899463 0.4179693 OVER
0.275
0.25
0.275
0.275
KBIAS
PSTAR
TR
R LABEL
0.275 2.1185343 1.95416140.3908335 UNDER
0.3

25

0.275
0.3
"lAS

PSTAR

0.325 2.(1829115

TR
R LABEL
1.6363190.3272158 UNDER

0.325

6.325
0.325
0.325
KBIAS

PSTAR
TR
0.35 2.0289314 1.4760887
0.35
0.3-5
0.35
0.35

K81AS

PSTAR

R LABEL
~.2952111

'R

R. lABEL

TR

R LABEl
UIIItJER

0.375 1.9198693 1.3406179-0.2681236 UNDER


0.375
0.375
0.315
0.375

KBlAS

P$TAR

0.4 1.9351114 1.2250181 0.2450036

0.'
0.'

BIAS

R LABEL
T.
"STAR
0.425 1.8941001 1.12552340.2251047 UNDER
0.425
0.425
0.425
0.425
K8IAS
TR
R. LABEt
PSTArt
0.45 1.8563696 1.0392253 0.2078451 UNDER
OAS
45
0.45
0.45
KBlAS
PSTAR
TR
R lABEL
0.475 I.B21528 0.9636473 0.1927695 UNOE~
0.475

0.415
0.475
0.415

TRHK

0.2750.32419250.01375830.261407458.2034021.6488649
BO
LAIR
LENG
LGOI
LGNS
LTfL
5.5153418 0.14447540.1963999 0.1764421 0.1710571 0.0635369
CDEF
T
PVALUE
LAIR 11.720654 3.853E-l0

LENG 3.2651576 0.0040725


LGOI 12.881383 7.756E-11
LGNS 13.9039392.075E-11
LTEL 4.87299640.0001056

VI F

0.456007
0.643119
0.2229594
0.2042108
0.5339643

Gary Wi 11 iam Carr

NYNX Corporation
335 Madison AY~ R.o~ 2104
New Ycrk. NY 10017
(212) 310-7459

1227

UNDER

You might also like