Professional Documents
Culture Documents
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 1
Recall that Regression model fitting has several implicit assumptions, including the following:
1. The model errors have mean zero and constant variance and use uncorrelated.
2. The model errors have a normal distribution this assumption is made in order to conduct
hypothesis tests and construct CIs under this assumptions, the errors are independent.
3. The form of the model, including the specification of the regressors, is correct.
Chapter 3 presented several techniques for checking the adequacy of the linear regression model. If the
linear regression model is not appropriate for a data set, there are two basic choices:
1. Abandon the regression model and develop a more appropriate model.
2. Employ some transformation on the data so that regression model is appropriate for the
transformed data.
We consider the use of transformation in this chapter.
4.1 Variance Stabilizing Transformation
The assumption of constant variance is a basic requirement of regression analysis. A common reason
for the violation of this assumption is for the response variable Y to follow a probability distribution in
which the variance is functionally related to the mean.
For example, if Y follow a Poisson distribution with mean , note that the variance of Y is equal to its
mean . Since the mean of Y related to the regressor variable X , the variance of Y will be
proportional to X .
Example 4.1:
Consider the simple linear regression model
i i i
x y c | | + + =
1 0
, where
i i
x Var
2
o c = ) ( . Suppose we use
the transformations
X
Y
Y = ' . Is this a Variance Stabilizing Transformation?
( ) ( )
2 2
2
2
1 1
) (
'
) (
) (
o o
o
o c
= = = |
.
|
\
|
=
=
=
=
x
x
Y Var
x x
Y
Var Y Var
x
Y
Y
x Y Var
x Var
i
i i
Yes, variance of Y became constanst.
Unequal error variances and non-normality of the error terms frequently appears together. To
remedial these departures from linear regression model, we need a transformation on Y , since
the shape and spreads of the distributions of Y need to be changed.
Transformation on Y may also at the same time help to linearize a curvilinear regression relation.
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 2
Figure 4.1 below contains some prototype regression relations where the skewness and error variance
increase with the mean response ) (Y E .
Figure 4.1: Prototype Regression Pattern
Transformation on Y
Y Y = ' ) ( log Y Y
10
= ' Y Y / 1 = '
Note: A simultaneous transformation on X may also be helpful or necessary.
Useful Variance-Stabilizing Transformations:
Relationship of
2
o to E(Y)
Transformation
constant
2
o o
Y = Y (no transformation)
) ( E
2
Y o o
Y = y (square root, Poisson data)
)] ( 1 )[ ( E
2
Y E Y o o
) ( sin '
1
Y Y
= (arsin; binomial proportions
0 Y
i
1)
2 2
) ( E Y o o
Y=ln(Y) (natural log)
3 2
) ( E Y o o
Y = Y
-1/2
(reciprocal square root)
4 2
) ( E Y o o
Y= Y
-1
(reciprocal)
Example 4.2:
Data on age ( X ) and plasma level of polymine ( Y ) for a portion of the 25 healthy children in a study
are presented below in R codes:
Age <- c(0,0,0,0,0,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
Plasma <- c(13.44,12.84,11.91,20.09,15.60,10.11,11.38,10.28,8.96,
8.59,9.83,9,8.65,7.85,8.88,7.94,6.01,5.14,6.9,6.77,4.86,5.1,5.67,5.75,6.23)
#Use lm() function to fit the model
Blood.Reg <- lm(Plasma~Age)
#create the scatter plot
plot(x = Age, y =Plasma , xlab="Age", ylab = "Plasma", main = "Plasma Level vs. Age Before
Transformation", col = "Red", pch = 19, cex=1.5)
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 3
0 1 2 3 4
5
1
0
1
5
2
0
Plasma Level vs. Age Before Transformation
Age
P
l
a
s
m
a
The scatter plot indicates curvilinear regression relationship, as well as the greater variability for
younger children than for older ones.
Based on the prototype regression pattern, we shall first try the logarithmic transformation, Y Y
10
log ' =
#create the scatter plot after transformation
LY <- log10(Plasma)
plot(x = Age, y =LY , xlab="Age", ylab = "Plasma", main = "Plasma Level vs. Age Before
Transformation", col = "Red", pch = 19, cex=1.2)
Note that the transformation not only has led to reasonably linear regression relation, but the variability
at the different levels of X also becomes reasonably constant.
To further examine the reasonableness of the transformation Y Y
10
log ' = , we fitted the simple linear
regression model to the transformed Y' data and obtained:
X y 1023 0 135 1 . .
=
#To fit the model Y Y
10
log ' = vs X
BloodT.Reg <- lm(I(log10(Plasma)~Age))
summary(BloodT.Reg)
# Create plot of Residual vs. Age after transformaton
plot(x = Age, y =BloodT.Reg$residuals, xlab ="Age", ylab = "Residuals", main = "Residuals vs. Age
after Transformation (y = log10(Y))", col = "blue", pch = 19, cex=1.5, panel.first = grid(col = "gray",
lty = "dotted"))
abline(h = 0, col = "red")
#Normal Probability plot After transformation
qqt.plot <- qqnorm(BloodT.Reg$residuals, main = "Normal Probability Plot After Transformation", xlab
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 4
= "Theoretical Quantiles", ylab = "Sample Quantiles", plot.it = TRUE,col="blue", pch = 19, cex=1.5,
panel.first = grid(col = "gray", lty = "dotted"))
abline(lm(qqt.plot$y~qqt.plot$x))
A plot of residuals against X and a normal probability plot after the transformation are shown below.
All of this shows evidence of the appropriateness of linear regression model for the transformed Y' data.
0 1 2 3 4
-
0
.
1
0
-
0
.
0
5
0
.
0
0
0
.
0
5
0
.
1
0
0
.
1
5
Residuals vs. Age after Transformation (y = log10(Y))
Age
R
e
s
i
d
u
a
l
s
-2 -1 0 1 2
-
0
.
2
-
0
.
1
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
Normal Probability Plot After
Transformation
Theoretical Quantiles
S
a
m
p
l
e
Q
u
a
n
t
i
l
e
s
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 5
4.1.1 Transformations on Y : The Box-Cox Method
It is often difficult to determine from diagnostic plots, such as the one in the plasma levels example,
which transformation of Y is most appropriate for correcting skewness of the distributions of error
terms, unequal error variances, and nonlinearity of regression function. The Box-Cox procedure
automatically identifies a transformation from the family of power transformations on Y .
Consider the transformed regression model of
i i i
x Y c | |
+ + =
1 0
) (
where
( )
=
=
=
0
0 1
) ( log
) (
Y
Y
Y
e
This definition was given by Box and Cox (1964). Due to the structure of a linear regression model, one
can equivalently express this as
=
=
=
0
0
) ( log
) (
Y
Y
Y
e
With this model, there is an extra parameter, , that need to be estimated. ,
0
| ,
1
| , and
2
o can be
estimated via maximum likelihood estimation. The estimated can then be used to suggest the type
of transformation. For example,
2
2 Y Y = = '
Y Y = = ' .5 0
Y Y ln '= = 0 (by definition)
Y
Y
1
5 0 = = ' .
Y
Y
1
0 1 = = ' .
Notice if is estimated to be 1, no transformation is needed. The estimate for is commonly searched
for in the range of -2 to 2.
The MLE of corresponds to the value of for which the residual sum of squares from the fitted
model ) (
E
SS is minimum. It is usually determined by plotting ) (
E
SS versus . Usually 10 20
values of are sufficient for estimation of the optimum value.
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 6
From Example 4.2, the Box-Cox results show:
) (
E
SS
) (
E
SS
1.0 78.0 -0.1 33.1
0.9 70.4 -0.3 31.2
0.7 57.8 -0.4 30.7
0.5 48.4 -0.5 30.6
0.3 41.4 -0.6 30.7
0.1 36.4 -0.7 31.1
0 34.5 -0.9 32.7
-1.0 33.9
Note that 5 0.
= , with 6 30. ) ( =
E
SS
Beside Y Y
10
log ' = , another choice is
Y
Y
1
'
= .
Another approach by R-codes
Example 4.4:
This data is in the MASS package. The MASS package contains a set of functions and datasets. See
help(trees) for specific information on the dataset.
Let Y = volume and X = height for the trees in the sample.
R-Codes
Library(MASS)
trees
mod.fit<-lm(formula = Volume ~ Height, data=trees)
summary(mod.fit)
#Plot of Y vs. X with sample model
plot(x = trees$Height, y = trees$Volume, xlab = "Height",
ylab = "Volume", main = "Volume vs. Height",
panel.first = grid(col = "gray", lty = "dotted"))
abline(mod.fit)
#e.i vs. Yhat.i
plot(x = mod.fit$fitted.values, y = mod.fit$residuals,
xlab = expression(hat(Y)), ylab = "Residual",
main = expression(paste("Residuals vs. ", hat(Y))),
panel.first = grid(col = "gray", lty = "dotted"))
abline(h = 0, col = "red")
#Determine lambda.hat In MASS package
save.bc<-boxcox(object = mod.fit, lambda = seq(from = -2,to = 2, by = 0.01))
title(main = "Box-Cox transformation plot")
lambda.hat<-save.bc$x[save.bc$y == max(save.bc$y)]
lambda.hat
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 7
65 70 75 80 85
1
0
2
0
3
0
4
0
5
0
6
0
7
0
Volume vs. Height
Height
V
o
l
u
m
e
Notice that the variability in the
i
y s increases as
i
x increases.
10 20 30 40
-
2
0
-
1
0
0
1
0
2
0
3
0
Residuals vs. Y
^
Y
^
R
e
s
i
d
u
a
l
The funnel shape occurs here. Based upon this and the scatter plot, it would be of interest to
consider a transformation of Y .
Also, notice the use of hat(Y) and the expression() function in the plot() function. Use demo(plotmath)
for more information about how to get mathematical symbols in plots.
Note:
The function expression returns a vector of type "expression" containing its arguments (unevaluated)
lambda.hat
[1] -0.19
-2 -1 0 1 2
-
1
4
5
-
1
4
0
-
1
3
5
-
1
3
0
-
1
2
5
l
o
g
-
L
i
k
e
l
i
h
o
o
d
95%
Box-Cox transformation plot
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 8
The boxcox() function estimates using maximum likelihood estimation.
Here, it shows the log-likelihood function is maximized when = -0.19. It also gives a likelihood
based 95% confidence interval of about -0.8 to 0.4 for . Notice that = 0 is in the interval (may want
to consider natural log transformation), and notice = 1 is not interval (transformation needed).
Using 19 0.
.
005526 0 9595 0
19 0
=
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 9
How would you find Y
?
( ) 19 . 0
1
* 005526 . 0 9595 . 0
= Height Y
Since = 0 is in the interval, it may be of interest to try the natural log transformation since this is
easier to interpret (and more common).
R-Codes
mod.fit3<-lm(formula = log(Volume) ~ Height, data = trees)
summary(mod.fit3)
plot(x = mod.fit3$fitted.values, y =
mod.fit3$residuals, xlab = "log(Y)", ylab =
"Residual", main = "Residuals vs. log(Y)",
panel.first = grid(col = "gray", lty = "dotted"))
abline(h = 0, col = "red")
Call:
lm(formula = log(Volume) ~ Height, data = trees)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.79652 0.89053 -0.894 0.378
Height 0.05354 0.01168 4.585 8.03e-05 ***
2.6 2.8 3.0 3.2 3.4 3.6 3.8
-
0
.
6
-
0
.
4
-
0
.
2
0
.
0
0
.
2
0
.
4
0
.
6
Residuals vs.log Y
^
log Y
^
R
e
s
i
d
u
a
l
The natural log transformation works as well. This sample model can be expressed as
= )
?
X
e Y
05354 . 0 7965 . 0
+
=
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 10
4.2 Transformations to Linearize the Model
When the distributions of the error terms are reasonable close to normal and have constant
variance, transformations on X should be attempted. The reason why transformations on Y may
not be desirable here is that a transformation on Y , such as Y Y = ' , may change the shape of the
distribution of the error terms from normal distribution and may also lead to substantially differing error
term variances.
Figure 4.2:
Prototype Regression Pattern Transformations of X
X X
10
log '=
X X = '
2
X X = '
) exp( ' X X =
X X / ' 1 =
) exp( ' X X =
Example 4.3:
Data from an experiment on the effect of number of days of training received ( X ) on performance(Y )
in a battery of simulated sales situations are presented below:
Train <- c(.5,.5,1,1,1.5,1.5,2,2,2.5,2.5)
Score <- c(42.5,50.6,68.5,80.7,89,99.6,105.3,111.8,112.3,125.7)
perf.Reg <- lm(Score~Train)
# Create scatter plot of Trainning vs.Score before transformaton
plot(x = Train, y = Score, xlab ="Trainning", ylab = "Performance", main = "Trainning vs. Performance
before Transformation", col = "blue", pch = 19, cex=1.5)
abline(perf.Reg)
# Create plot of Residual vs. Predited variable before transformaton
plot(x = perf.Reg$fitted.values, y =perf.Reg$residuals, xlab ="Predicted Values", ylab = "Residuals",
main = "Residuals vs. Predicted Values Before Transformation", col = "blue", pch = 19, cex=1.5,
panel.first = grid(col = "gray", lty = "dotted"))
abline(h = 0, col = "red")
#Normal Probability plot Before transformation
qq.plot <- qqnorm(perf.Reg$residuals, main = "Normal Probability Plot Before Transformation", xlab =
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 11
"Theoretical Quantiles", ylab = "Sample Quantiles", plot.it = TRUE,col="blue", pch = 19, cex=1.5,
panel.first = grid(col = "gray", lty = "dotted"))
abline(lm(qq.plot$y~qq.plot$x))
0.5 1.0 1.5 2.0 2.5
4
0
6
0
8
0
1
0
0
1
2
0
Trainning vs. Performance before Transformation
Trainning
P
e
r
f
o
r
m
a
n
c
e
50 60 70 80 90 100 110 120
-
1
0
-
5
0
5
1
0
Residuals vs. Predicted Values Before Transformation
Predicted Values
R
e
s
i
d
u
a
l
s
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
-
1
0
-
5
0
5
1
0
Normal Probability Plot Before
Transformation
Theoretical Quantiles
S
a
m
p
l
e
Q
u
a
n
t
i
l
e
s
The scatter plot indicates that the relation appears to be fairly curvilinear. Since the variability at
the different X levels appears to be fairly constant, we shall consider a transformation on X .
Based on the prototype plot, we shall consider initially the square root transformation X X = ' .
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 12
# Create scatter plot of Trainning vs.Score after transformaton
XP <- sqrt(Train)
plot(x = XP, y = Score, xlab ="Sqrt(Trainning)", ylab = "Performance",
main = "Sqrt(Trainning) vs. Performance
after Transformation", col = "blue", pch = 19, cex=1.5)
#To fit the model y vs sqrt(x)
perfT.Reg <- lm(Score~I(sqrt(Train)))
summary(perfT.Reg)
plot(x = perfT.Reg$fitted.values, y =perfT.Reg$residuals, xlab ="Predicted Values", ylab = "Residuals",
main = "Residuals vs. Predicted Values After Transformation", col = "blue", pch = 19, cex=1.5,
panel.first = grid(col = "gray", lty = "dotted"))
abline(h = 0, col = "red")
qqt.plot <- qqnorm(perfT.Reg$residuals, main = "Normal Probability Plot After
Transformation (x` =sqrt(x))", xlab = "Theoretical Quantiles", ylab = "Sample
Quantiles", plot.it = TRUE,col="blue", pch = 19, cex=1.5, panel.first = grid(col =
"gray", lty = "dotted"))
abline(lm(qqt.plot$y~qqt.plot$x))
0.8 1.0 1.2 1.4 1.6
4
0
6
0
8
0
1
0
0
1
2
0
Sqrt(Trainning) vs. Performance
after Transformation
Sqrt(Trainning)
P
e
r
f
o
r
m
a
n
c
e
60 80 100 120
-
1
0
-
5
0
5
Residuals vs. Predicted Values After Transformation
Predicted Values
R
e
s
i
d
u
a
l
s
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 13
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
-
1
0
-
5
0
5
Normal Probability Plot After
Transformation (x` =sqrt(x))
Theoretical Quantiles
S
a
m
p
l
e
Q
u
a
n
t
i
l
e
s
Note that the scatter plot of Y versus X shows a reasonable linear relation. The variability
of the scatter plot at the different X levels is the same as before.
The plot of residual against X' shows no evidence of unequal error variances. The normal probability
plot after transformation also shows no indications of substantial departures from normality. Thus the
simple linear regression model c | | + + = X Y
1 0
appears to be appropriate here.
Fit the model using the transformed data, we obtain:
X Y 45 . 83 33 . 10
+ =
4.3.2 Transformations on the Predictor variable ( X ): The Box and Tidwell Method
Suppose that the relationship between Y and one or more of the regressor variables is nonlinear but
that the usual assumptions of normally and independently distributed responses with constant
variance are at least approximately satisfied. We want to select an appropriate transformation on
the regressor variables so that the relationship between y and the transformed regressor is as
simple as possible.
Box and Tidwell describe an analytical procedure for determining the form of the transformation on X .
Assume that the response variable Y is related to a power of the regressor, say
o
| X = , as
| | | | | |
1 0 1 0
+ = = ) , , ( ) ( f Y E where
=
=
=
0
0
o
o
|
o
X
X
ln
and
0
| ,
1
| and o are unknown parameters.
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 14
The procedure is:
Let 1
0
= o as the initial guess of o , so that X X = =
0
0
o
| , or that no transformation at all is applied in
the first iteration.
Expanding about the initial guess in a Taylor series and ignoring terms of higher than first order:
0
0
1 0 0
0 1 0 0
o o
| |
o
| | |
o o | | |
=
=
)
`
+ =
d
df
f Y E
) , , (
) ( ) , , ( ) (
0
0
1 0 0
1 0
1
o o
| |
o
| | |
o | |
=
=
)
`
+ + =
d
df
X
) , , (
) (
Note:
If the term in braces were known, it could be treated as an additional regressor variable, and it would be
possible to estimate the parameters
0
| ,
1
| and o by least squared estimation.
0 0
0
0
1 0 0 1 0 0
o o | |
o o
| |
o
|
|
| | |
o
| | |
= =
=
=
)
`
)
`
=
)
`
d
d
d
df
d
df ) , , ( ) , , (
=
( )
dX
X d
1 0
| | +
.
0
o o
o
o
=
d
X d ) (
= ) ln(X X
1
|
Thus,
) ln( ) ( ) (
* *
X X X Y E
1 1 0
1 | o | | + + =
W X
* * *
2 1 0
| | | + + =
where
1 2
1 | o | ) (
*
= and ) ln(X X W = .
Note that
1
| can be estimated by fitting the model X Y
1 0
| |
+ =
*
2
| can be estimated by fitting the model W X Y
* * *
2 1 0
| | | + + =
Taking 1
1
2
1
+ =
|
|
o
*
as the revised estimate of o .
This procedure may now be repeated using new regressor
1
o
X X = ' in the calculations.
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 15
Box and Tidwell (1962) noted that this procedure usually converges quite rapidly, and often the first-
stage result
1
o is a satisfactory estimate of o . However, round-off error is potentially a problem.
Convergence problems may be encountered in cases where the error standard deviation is large or when
the range of the regressor is very small compared to its mean.
Note:
1
|
and
*
1
| are generally differ.
Example 4.5:
A research engineer is investigating the use of a windmill to generate electricity. He has collected data
on the DC output (Y ) from his windmill and the corresponding wind velocity ( X ).
R-Codes:
Y <- c(.123, .5, .653, .558, 1.057, 1.137, 1.144, 1.194, 1.562, 1.582, 1.501, 1.737, 1.822, 1.866, 1.93,
1.8, 2.088, 2.179, 2.166, 2.112, 2.303, 2.294, 2.386, 2.236,2.31)
X <- c(2.45, 2.7, 2.9, 3.05, 3.4, 3.6, 3.95, 4.1, 4.6, 5, 5.45,5.8, 6, 6.2, 6.35, 7,7.4, 7.85, 8.15, 8.8, 9.1,
9.55, 9.7, 10, 10.2)
plot(X, Y, xlab = "Wind Velocity, X", ylab = "DC Output, Y", main = "DC Output vs. Wind Velocity",
col = "Blue", pch = 19, cex=1.5)
#First iteration
Fit0 <- lm(Y~X)
FitT0 <- lm(Y~X+I(X*log(X)))
Fit0
FitT0
4 6 8 10
0
.
5
1
.
0
1
.
5
2
.
0
DC Output vs. Wind Velocity
Wind Velocity, X
D
C
O
u
t
p
u
t
,
Y
The scatter plot suggests that the relationship between DC output and wind speed is not straight
line and that some transformation on X may be appropriate.
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 16
#First iteration
Call:
lm(formula = Y ~ X)
Coefficients:
(Intercept) X
0.1309 0.2411
Call:
lm(formula = Y ~ X + I(X * log(X)))
Coefficients:
(Intercept) X I(X * log(X))
-2.4168 1.5344 -0.4626
We begin with the initial guess 1
0
= o and fit the two variables:
X Y
1 0
| | + = = 0.1309 + 0.2411X
and
W X Y
* * *
2 1 0
| | | + + = = -2.4168 + 1.5344X 0.4626W
and we calculate
1
o = 9187 . 0 1
2411 . 0
4626 . 0
1
1
*
2
= + = +
|
|
as the improve estimate of o . Note that this estimate of o is very close to -1, so the reciprocal, X / 1 ,
transformation on X is appropriate.
R-codes:
#Download the package car from the CRAN homepage.
#To install the package: Menu->Packages->Install package(s) from local zip files.
Library(car)
Box.tidwell(Y~X)
Output:
Initial Power -0.91830
Score Statistic -9.13243
p-value 0.00000
MLE of Power -0.83334
iterations = 3
W = XlnX (from pg 14)
1
o
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 17
#Second iteration
Alpha1<- FitT0$coefficients[3]/ Fit0$coefficients[2]+1
lm(Y~I(X^ Alpha1))
lm(Y~I(X^Alpha1)+I((X^ Alpha1)*log(X^ Alpha1)))
#Second iteration
Call:
lm(formula = Y ~ I(X^ Alpha1))
Coefficients:
(Intercept) I(X^ Alpha1)
3.101 -6.683
Call:
lm(formula = Y ~ I(X^ Alpha1) + I((X^ Alpha1) * log(X^ Alpha1)))
Coefficients:
(Intercept) I(X^Alpha1) I((X^ Alpha1) * log(X^ Alpha1))
3.2409 -6.4445 0.5994
To perform a second iteration, define a new regressor variable
9183 0.
'
= X X and fit the model
'
X Y
1 0
| | + = = 3.101 6.683X
and
W X Y ' + ' + =
* * *
2 1 0
| | | = 3.2409 6.4445X + 0.5994W
where ' ln ' ' X X W = . The second-step estimate of o .is thus
=
2
o 01 . 1 ) 9183 . 0 (
683 . 6
5994 . 0
1
1
*
2
= + = +o
|
|
which again supports the use of the reciprocal transformation on X .
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 18
Generalized and Weighted Least Squares
4.2.1 Generalized Least Squares
A difficulty with transformations of Y is that they may create an inappropriate regression
relationship. When an appropriate regression relationship has been found but the variances of the error
terms are unequal, an alternative transformation is weighted least squares.
Consider the model: X Y + =
0 () = E , V ()
2
o = Var
The ordinary least-squares estimator y X X) X (
' ' =
1
| is no longer appropriate.
Note:
V
2
o is the covariance matrix of the errors and we define KK K K V = ' = , where K is a nonsingular
symmetric matrix. The matrix K is often called the square root of V.
Define the new variables
y K Z
1
= , X K B
1
= , K g
1
=
The regression model can be transformed as
K X K y K
1 1 1
+ = or g B Z + =
where the errors in this transformed model have zero expectation,
i.e. 0 () K (g)
1
= =
E E
and the covariance matrix of g is
} ] (g) (g)][g {[g (g) ' = E E E Var
1 1 1 1
' = ' = ' = K ) ( K ) K (K ) g (g E E E
I KKK K VK K
2 1 1 2 1 1 2
o o o = = =
Thus, the elements of g have mean zero and constant variance and are uncorrelated.
Since errors g in this new model satisfy the usual assumptions, we may apply ordinary least squares.
The least squares function is X) (y V ) X (y V g g ()
1 1
' = ' = ' =
S .
The normal equations are y V X
X) V X (
1 1
' = ' .
The solution to these equations is y V X X) V X (
1 1 1
' ' =
( = E
2.
1 1 2 1 2
' = ' = X) V X ( B) B ( )
( o o Var
3. When I V = , the error terms, , have uncorrelated and equal variances, the ordinary least-
squares estimator y X X) X (
' ' =
1
| is appropriate.
4. When V is a diagonal matrix with unequal diagonal, the error terms, , have uncorrelated
but unequal variances, the generalized least squares estimator y V X X) V X (
1 1 1
' ' = is used.
4.2.2 Weighted Least Squares
When the errors are uncorrelated but have unequal variances and
(
(
(
(
=
n
w
w
w
/
/
/
V
1 0
1
0 1
2
1
,
let
1
= V W , (since V is a diagonal matrix, W is also diagonal with diagonal elements or weights
n
w w w , , ,
2 1
.) the weighted least squares estimator y W X X) W X (
1 1 1
' ' = is used.
Notes:
1)
i
w is used to stand for weight
2) These estimators are unbiased and have minimum variance among all unbiased estimators.
3) Since the weight
i
w is inversely related to the variance
2
i
o , it reflects the amount of information
contained in the observation
i
y . Thus, an observation
i
y that has a large variance receives less
weight than another observation that has a smaller variance. The more precise is
i
y (i.e., the
smaller is
2
i
o ), the more information
i
y provides about ) (
i
y E and therefore the more weight it
should receive in fitting the regression function.
Problem:
i
w is usually unknown.
Solutions:
1) Examine a plot of
i
e vs.
i
y (using regular least squares estimates). When the constant variance
assumption is violated, the plot may look like:
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 20
Y
Y
0
Y
Divide the plot into 3 to 5 groups. Estimate the variance of the
i
e s for each group by
2
j
S .
Y
Y
0
Y
Set
2
1
j j
S w / = where j denotes the group number.
2) Suppose the variance of the residuals is varying with one of the predictor variables. For
example, suppose the following plot is obtained.
Y
Y
0
Y
X
k
e
i
Chapter 4 Transformations and Weighting to Correct Model
Inadequacies
UECM2263 Applied Statistical Model
Chapter 4 - 21
Fit a simple regression model (estimated variance or standard deviation function) using the
2
i
e
(or
i
e ) as the response variable and
ik
X as the predictor variable. The predicted values from
the estimated variance or standard deviation function for each observation are then used to
find the weights,
i i
V w
/ 1 = where
i
V
quant5<-quantile(x = mod.fit$fitted.values, probs =c(0.2, 0.4, 0.6, 0.8), type = 1)
round(quant5,2)
#Put Y