Lack of Fit Test

Lack of fit test:
We want to check the hypothesis that

H o :there is no presence of lack of fit .
Against
H 1 :there isa presence of lack of fit .
What is lack of fit?

It means our data do not fit well to our assumed regression
model.
Let us take simple regression model
y= 0 + 1 x +
Here as usual
is response variable,
is predictor and is
random error.
Now we want to check that whether the sample data
( y i , x i ) , i=1,2, n .
Fits well to the above linear model or not?
So will use lack of fit F-test for this purpose as follows:
The lack of fit test requires that we have replicate observations on
y
the response
for at least one level of x . These replicates
do need to be truly independent.
E.g. suppose that y is product viscosity and
x
is temperature. True replication consists of running ni
separate experiments at x=x i and observing viscosity ni times.

The reading obtained at each replication shows variability in value
of y i for the same level of x=x i . This variability is just because
of method of measuring viscosity. The error variance includes

this measurement error and the variability associated with
reaching and maintaining the same temperature level in different
experiments.
And these replicated observations are used to obtain a model
2
independent estimate of .
Suppose that we have ni observations on the response

at the ith level of the regressor
jth
x i , i=1,2, , m.
observation on the response at
Let
y ij
denote the
x i , i=1,2, m j=1,2, ni .
There
ni total observations
are n=
i=1
The test procedure involves partitioning the residual sum of

square into two parts, say
SS res = SS pe + SS lof
Here
SS res is sum of squares due to residuals
SS pe
is sum of squares due to pure error
SS lof
is sum of squares due to lack of fit
To develop this partitioning of SS res note that the ( ij ) th

residual is
y ij ^
y i=( y ij y i ) +( y i^
yi )
Here ^y i is the mean of ni Observations at
xi
. Squaring and
summing on both sides over i j we get

y
ni
ni
ni
( ij ^yi ) = ( y ij y i) + ( y i ^y i)2
j=1
i=1 j=1
i=1 j =1
i=1
y
ni
ni
( ij ^yi ) = ( y ij y i) + ni ( y i ^y i )2
j=1
i=1 j=1
i=1
>
i=1
Since cross product term equals zero.

The LHS is the usual residual sum of squares. the pure error sum
of squares
m
ni
SS pe = ( y ij y i )
i=1 j=1
is obtained by computing the corrected sum of squares of the

repeat observations at each level of
and then pooling over
the m levels of x . If the assumption of constant variance is

satisfied , this is a model-independent measure of pure error
since only the variability of the
'
ys
at each x level is used to
compute SS pe .
Since there are ni1 degrees of freedom for pure error at each
level x i ,total nos. of d.f. associated with the pure error sum of
squares is
( ni1 )=nm
i=1
The sum of squares due to lack of fit is
SS lof = ni ( y i^
y i)
i=1
is a weighted sum of squared deviation between the mean

response
y i
at each
level and the corresponding fitted
values.
If the fitted values ^y i are close to the corresponding average
responses y i , then there is a strong indication that the
^
yi
regression function is linear. If
deviate greatly from the y i ,
then it is likely that the regression function is not linear.

There are m2 d.f. are associated with SS lof , since there are m
levels of x and two d.f. are lost because two parameters must
be estimated to obtain the
y i
SS lof is always calculated by subtracting
SS pe from
The test statistics for lack of fit is

SSlof
m2
F0 =
SS pe
nm
MS lof
MS pe
SS res
2
The expected value of MS pe is , and the expected value of
MS lof
m
E ( MS lof ) = + ni [ E ( y i ) 0 1 x i ] /(m2)
2
Is
i=1
Now if true regression function is linear then
E ( y i )= 0 1 x i
and the second term of above equation is zero gives

And if true regression function is not linear then
E ( MS lof ) = 2 .
E ( y i ) 0 1 xi
2
And E ( MS lof ) > . And also when true regression function is linear
then the statistic F0 follows the Fm 2,nm distribution

Therefore to test lack of fit we would compute the test statistic
F0
and conclude that the regression function is not linear if
F0 > F m2,nm

Lack of Fit Test

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lack of Fit Test

Uploaded by

Copyright:

Available Formats

Lack of fit test:

We want to check the hypothesis that

What is lack of fit?

is temperature. True replication consists of running ni

separate experiments at x=x i and observing viscosity ni times.

of method of measuring viscosity. The error variance includes

Suppose that we have ni observations on the response

observation on the response at

The test procedure involves partitioning the residual sum of

is sum of squares due to pure error

is sum of squares due to lack of fit

To develop this partitioning of SS res note that the ( ij ) th

Here ^y i is the mean of ni Observations at

summing on both sides over i j we get

Since cross product term equals zero.

is obtained by computing the corrected sum of squares of the

and then pooling over

the m levels of x . If the assumption of constant variance is

at each x level is used to

The sum of squares due to lack of fit is

is a weighted sum of squared deviation between the mean

level and the corresponding fitted

regression function is linear. If

deviate greatly from the y i ,

then it is likely that the regression function is not linear.

SS lof is always calculated by subtracting

The test statistics for lack of fit is

Now if true regression function is linear then

and the second term of above equation is zero gives

then the statistic F0 follows the Fm 2,nm distribution

and conclude that the regression function is not linear if

You might also like