You are on page 1of 5

Lack of fit test:

We want to check the hypothesis that


H o :there is no presence of lack of fit .

Against
H 1 :there isa presence of lack of fit .

What is lack of fit?


It means our data do not fit well to our assumed regression
model.
Let us take simple regression model
y= 0 + 1 x +

Here as usual

is response variable,

is predictor and is

random error.
Now we want to check that whether the sample data
( y i , x i ) , i=1,2, n .
Fits well to the above linear model or not?
So will use lack of fit F-test for this purpose as follows:
The lack of fit test requires that we have replicate observations on
y
the response
for at least one level of x . These replicates
do need to be truly independent.
E.g. suppose that y is product viscosity and
x

is temperature. True replication consists of running ni

separate experiments at x=x i and observing viscosity ni times.


The reading obtained at each replication shows variability in value
of y i for the same level of x=x i . This variability is just because

of method of measuring viscosity. The error variance includes


this measurement error and the variability associated with
reaching and maintaining the same temperature level in different
experiments.
And these replicated observations are used to obtain a model
2
independent estimate of .

Suppose that we have ni observations on the response


at the ith level of the regressor
jth

x i , i=1,2, , m.

observation on the response at

Let

y ij

denote the

x i , i=1,2, m j=1,2, ni .

There

ni total observations
are n=
i=1

The test procedure involves partitioning the residual sum of


square into two parts, say
SS res = SS pe + SS lof

Here
SS res is sum of squares due to residuals
SS pe

is sum of squares due to pure error

SS lof

is sum of squares due to lack of fit

To develop this partitioning of SS res note that the ( ij ) th


residual is
y ij ^
y i=( y ij y i ) +( y i^
yi )

Here ^y i is the mean of ni Observations at

xi

. Squaring and

summing on both sides over i j we get


y
ni

ni

ni

( ij ^yi ) = ( y ij y i) + ( y i ^y i)2
j=1

i=1 j=1

i=1 j =1

i=1

y
ni

ni

( ij ^yi ) = ( y ij y i) + ni ( y i ^y i )2
j=1

i=1 j=1

i=1

>
i=1

Since cross product term equals zero.


The LHS is the usual residual sum of squares. the pure error sum
of squares
m

ni

SS pe = ( y ij y i )

i=1 j=1

is obtained by computing the corrected sum of squares of the


repeat observations at each level of

and then pooling over

the m levels of x . If the assumption of constant variance is


satisfied , this is a model-independent measure of pure error
since only the variability of the

'

ys

at each x level is used to

compute SS pe .
Since there are ni1 degrees of freedom for pure error at each
level x i ,total nos. of d.f. associated with the pure error sum of
squares is

( ni1 )=nm
i=1

The sum of squares due to lack of fit is

SS lof = ni ( y i^
y i)

i=1

is a weighted sum of squared deviation between the mean


response

y i

at each

level and the corresponding fitted

values.
If the fitted values ^y i are close to the corresponding average
responses y i , then there is a strong indication that the
^
yi

regression function is linear. If

deviate greatly from the y i ,

then it is likely that the regression function is not linear.


There are m2 d.f. are associated with SS lof , since there are m
levels of x and two d.f. are lost because two parameters must
be estimated to obtain the

y i

SS lof is always calculated by subtracting

SS pe from

The test statistics for lack of fit is


SSlof
m2
F0 =
SS pe
nm

MS lof
MS pe

SS res

2
The expected value of MS pe is , and the expected value of

MS lof
m

E ( MS lof ) = + ni [ E ( y i ) 0 1 x i ] /(m2)
2

Is

i=1

Now if true regression function is linear then

E ( y i )= 0 1 x i

and the second term of above equation is zero gives


And if true regression function is not linear then

E ( MS lof ) = 2 .

E ( y i ) 0 1 xi

2
And E ( MS lof ) > . And also when true regression function is linear

then the statistic F0 follows the Fm 2,nm distribution


Therefore to test lack of fit we would compute the test statistic
F0

and conclude that the regression function is not linear if

F0 > F m2,nm

You might also like