Professional Documents
Culture Documents
Stanley L. Sclove
Professor, Information & Decision Sciences Department
University of Illinois at Chicago
Outline
1. Background
2. Some Related Mathematics
3. Confidence Intervals
4. Multivariate Data
Background
Often, one has to consider the issue of whether or not to pre-process the data by transforming
them. Often the variables are positive, such as lengths, weights, prices, or blood pressures. Then
usually the transformations under consideration are the log, square root, or reciprocal.
The reciprocal transform might mean, for example, analyzing gallons per miles instead of miles
per gallon.
The log transform can be done with natural logs or common (base ten) logs. This is just a
change of scale, for ln y = (ln 10)(log y). To see this, suppose y = el = 10m . Then ln el = ln10m,
and l = m ln 10; that is, ln y = (log y)(ln 10).
Box and Cox defined a family of power transformations which includes any positive or negative
power, as well as the log. The transformation is
Y () =
Y1
for 6= 0, and
Y (0) = ln Y.
The transform Y (0), corresponding to = 0, is defined by continuity as
lim Y () .
This limit gives an indeterminate form of the type zero-over-zero; applying l0H opital0s rule shows
the limit to be ln Y.
Note that t and F statistics are invariant under linear transformation y a + by, so in this
sense it makes to difference whether Y () or Y is analyzed. But, as will be seen, using Y () puts
things on a common scale, enabling the choice of power transformations.
For = 1 the transform is (Y 1 1)/(1), or 1/Y + 1, essentially the negative reciprocal.
To see how the Box-Cox transformations work, write the likelihood of the N observations,
consider it as a function of and maximize (its log) with respect to . To derive the likelihood,
let F (y) be the cumulative distribution function (c.d.f.) of the random variable (r.v.) Y , evaluated
at y; that is F (y) = Pr{Y y}, and let f (y) denote the probability density function (p.d.f.). It
is assumed that there is a value of for which the r.v. Y () is Normally distributed. Let F (y ())
denote the c.d.f. of the random variable Y , evaluated at y (). Let (y; , 2) denote the p.d.f. of the
Normal distribution with parameters and 2 . Then (y; , 2) = (2 2)1/2 exp[(y )2 /2 2].
First obtain the p.d.f. of any single observation Y . Write
dF (y)
d Pr{Y y}
=
dy
dy
()
()
d Pr{Y
y }
=
dy
()
dF(y )
=
dy
dy ()
= f (y ())
dy
f (y) =
= f (y ()) y 1
= (y ; , 2 ) y 1
2
Let L() denote the likelihood, that is , the joint p.d.f. of Y1 , Y2, . . . , YN at y1 , y2, . . . , yN
considered as a function of the parameters , , and 2 . This is
L() = =
N
Y
f (yi ) =
i=1
N
Y
()
(Yi
; , 2 ) yi1
i=1
= (22)N/2 exp[
N
X
()
(yi
)2 /(22 ]
i=1
N
Y
yi1 .
i=1
(N/2) ln 2
N
X
()
(yi
/(22)
+ ( 1)
i=1
n
X
ln yi .
i=1
n
X
()
(yi
y ())2.
i=1
This gives
max ln L() = (N/2) ln 2 (N/2) ln SSD() + (N/2) ln N N/2 + ( 1)
2
,
N
X
i=1
= (N/2) ln SSD() + ( 1)
N
X
ln yi + Constant.
i=1
N
X
ln yi
i=1
N
X
i=1
is minimized.
ln yi
ln yi
N
Y
yi )
Y1
,
(g.m.)1
1/N
= exp(
i=1
N
X
ln yi /N ),
i=1
Let denote any one of the regression parameters in a Normal multiple linear regression model.
The F test statistic for H0 : = 0 is
F (0 ) =
RSS(0) RSS()
,
RSS()/RDF
{ : RSS() RSS()(1
+ t2 /RDF )},
where RDF = N p 1. This applies to multiple regression models. When the model involves
only a mean, the CI is
4
{ : SSD() SSD()(1
+ t2 /RDF )},
where RDF = N 1. In terms of s2 , this is
{ : s2 s2 (1 + t2 /RDF )},
or in terms of s , it is
{ : s s (1 + t2 /RDF )}.
Multivariate Data
Usually it is preferred to make the same transformation for all n variables. This would be the case
if all the variables are lengths, or all are weights, or all are prices, or all are blood pressures, etc.
Here is a suggestion for such situations.
To choose , apply the Box-Cox method to each variable separately. Typically the CIs will
overlap, i.e., will have a non-null intersection S. Here is one way to proceed:
If = 1 is in S,
dont transform any of the variables. If = 1 is not in S but = 0 is, make the log transform on
all variables. If = 1 and = 0 are not in S but = 1/2 is, make the square root transform on
all variables. If = 1, 0, and 1/2 are not in S but = 1 is, make the reciprocal transform on all
variables.
If the CIs dont overlap, consider increasing the confidence level (e.g., from 95% to 99%) so that
the CIs become wider and overlap.
c
Copyright
2005