Professional Documents
Culture Documents
w. Jebarani
M. Thirunavukkarasu '
2005
ACKNOWLEDGEMENT
The autho1"s thank the Dean, Mad1"as Vete1"ina1"Y
College, Chennal - 600
t-rorn
007
I,)
enCOU1"agement.
Manl,
\-lusband1"Y
The
P1"otesso1" and
Statistics
and
suggestions
Head,
Compute1"
Dept
Animal
Applications,
the
contents
of
the
manual
a1"e
g1"atefully
acknowledged.
Authors
CONTENTS
, TOPIC
Theory of Sampling
PAGE NO.
01 - 16
l?- 34 ;,
35 - 37
38 - 42
Path Analysis
43 - 46
Design of Experiment
47 - 61
<"'1
7"
.,
Factorial Experiment
62 - 71
72 -76
77 - 90
List of Tables
91 -100
THEORY OF SAMPLING
THEORY OF SAMPLING
The data In needs and resources may be classified into the following two groups:
1.
Survey Data: This type of data already exists which can be collected and recorded
by observation or enquiry.
2.
Experimental Data: This type of data can only be obtained with the help of well
By
population (or universe) it is meant that the aggregate of all units of given type under
consideration at a particular point of time. The information needed about the population is
normally the totals or averages of the values of various characteristics.
Information on a population may be collected in two ways. One is called complete
enumeration or census and the other is called sample enumeration or sample survey.
Parameter: Statistical meaSures pertaining to the population e.g. mean or s.d of the
Population and the like measures
PrinCiples of Sample survey: The theory of sampling is based on the following important
principles:
1.
mathematical theory of probability. According to King 'the law of statistical regularly lays down
that a mQdemte~'J
~a~ge r,lJmbe~ Q~
on the average to possess the characteristics of the large group." This prinCiple stresses the
THEORY OF SAMPLING
desirability and importance of selecting the sample at random so that each and every unit in the
population has an equal chance of being selected in the sample.
An immediate derivation from the principle of statistical regularity is the Principle of
Inertia of large Numbers which states that, "other things being equal, as the sample size
increases, the result tend to be more reliable and accurate." This is because in dealing with
large numbers the variations in the component parts tend to balance each other and
consequently the variation in the aggregate result is likely to be insignificant. For example, in a
coin tossing experiment, the results will be approximately 50% heads and 50% tails provided
the experiment is performed in a fairly large number of times.
2.
enable us to obtain valid tests and estimates about the parameters of the population. The
samples obtained by the technique of probability sampling satisfy this principle.
3.
results in terms of efficiency and cost of the design with the resources at our disposal. The
reciprocal of sampling variance of and estimate provides a measure of its efficiency while a
measure of the cost of the design is provided by the total expenses incurred in terms of money
and man hour. The principle of optimization consists in
i" .,'
i.
ii.
processing and analysis of a data may be broadly classified under the following two heads:
1. Sampling Errors and 2.Non-sampling Errors
1. Sampling Errors: Sampling errors have their origin in sampling and arise due to the fact
that only a part of the population (i.e. sample) has been used to estimate population
parameters and draw inferences about the population. As such the sampling errors are absent
in a complete enumeration survey.
Sampling biases are primarily due to the following reasons:
i.
use of defective sampling technique for the selection of a sample, e.g., purposive or judgment
sampling in which the investigator deliberately selects a representative sample to obtain certain
results. This bias can be over comed by strictly adhering to a simple random sample or by
selecting a sample at random subject to restrictions which while improving the accuracy are of
such nature that they do not introduce bias in the results.
ii.
included in the random sample, the investigators usually substitute a convenient member of the
THEORY OF SAMPLING
population. This obviously leads to some bias since the characteristics possessed by the
substituted unit will usually be different from those possessed by the unit originally included in
the sample.
iii.
Constant error due to improper choice of the statistics for estimating the
X2 ... , Xn
=I
(Xi -
i=1
i =1
Remark: Increase in the sample size (i.e. the number of units in the sample) usually
results in the decrease in sampling error. In fact, in many situations this decrease in sampling
error is inversely proportional to the square root of the sample size as illustrated in the diagram
given below.
.._
g
w
OJ
C
0.
<U
(f)
Non-sampling Errors: As distinct from sampling errors which are due to the
inductive process of inferring about the population on the basis of a sample, the non-sampling
errors primarily arise at the stages of observation, ascertainment and processing of the data
and are thus present in both the complete enumeration survey and the sample survey. Thus,
the data obtained in a complete census, although free from sampling errors, would still be
subject to non-sampling errors whereas data obtained in a sample survey would be subject to
both sampling and non-sampling errors.
Non-sampling errors can occur at every stage of the planning or execution of census or
sample survey. The preparation of an exhaustive list of all the sources of non-sampling errors
is a very difficult task. However, careful examination of the major phases of a survey (complete
THEORY OF SAMPLING
following factors.
Family planning or definitions: The planning of a survey consists in explicitly
i.
stating the objectives of the survey. These objectives are then translated into i. A set of
definitions of the characteristics for which data are to be collected, and ii. Into a set of
specifications for collection, processing and publishing. Here the non-sampling errors can be
due to:
a. Data specification being inadequate and inconsistent with respect to the objectives
of the survey.
b. Error due to location of the units and actual measurement of the characteristics,
errors in recording the measurements, errors in recording the measurements, errors due to illdesigned questionnaire, etc.,
c. Lack of trained and qualified investigators and lack of adequate supervisory staff.
ii.
furnished by the respondents and may be due to any of the following reasons:
a.
b.
C.'
Self-Interest: quite often, in order to safeguard one's self interest, one may give
incorrect irformation, e.g. a person may give an underestimate of his salary or
production and an over - statement of his expenses or requirements etc.,
d.
Bias due to interviewer: Sometimes the interviewer may affect the accuracy of the
response by the way he asks questions or records them. The information obtained
on suggestions from the interviewer is very likely to be influerced by interviewer's
beliefs and prejudices.
e.
3.
on all the sampling units. In house-te-house survey, non-response usually results if the
THEORY OF SAMPLING
respondent is not found at home even after repeated calls, or if he is unable to furnish the
information on all the questions or if he refuses to answer certain questions. Therefore, some
bias is introduced as a consequence of the exclusion of a section of the population with certain
peculiar characteristics, due to non - response.
4.
Errors in Coverage: If the objectives of the survey are not precisely stated in
clear cut terms this may result in (i) the inclusion in the survey of certain units which are not to
be included, or (ii) the exclusion of certain units which were to be included in the survey under
the objectives. For example, in a ce'lSUS to determine the number of individuals in the age
group, say, 20 years to 50 years more or less serious errors may occur in deciding whom to
enumerate unless particular community or area is not specified and also the time at which the
age is to be specified.
5.
coding of the responses, punching of cards, tabulation and summarizing the original
observations made in the survey are a potential source of error. Compilation errors are subject
to control through verification, consistency check, etc.
6.
tabulated results are basically due to two sources. The first refers to the mechanics of
publication - the proofing error and the like. The other, which is of more serious nature, lies in
the failure of the survey organization to point out the limitations of the statistics.
Remarks: 1. In a sample survey, non-sampling errors may also arise due to defective frame
and faculty selection of sampling units.
2.lt is obvious that the non-sampling errors are likely to be more serious in a complete
census as compared to sample survey since in a sample survey the non- sampling errors can
be reduced to a greater extent by employing qualified, trained and experienced personnel,
better supervision and better equipment for processing and analyzing relatively smaller data as
compared to a complete census.
It has already been pointed out that usually sampling error decreases with increase in
sample size. On the other hand, as the sample size increases, the non-sampling error tends to
\ increase. Accordingly as sample size increases, the behaviour of non-sampling error is likely to
be opposite to that of sampling error.
3.Quite often, the non-sampling error in a complete census is greater than both the
sampling and non-sampling errors taken together in a sample survey. Obviously in such
situations sample survey is to be preferred to complete enumeration survey.
THEORY OF SAMPLING
Advantage of sampling: .
&
It saves time. Since only part of the population is studied, time taken is less not only in
collecting the data but also in processing the data.
&
It saves cost. The amount of labour and expenses involved will be less for part of the
population.
&
It provides greater accuracy. Since only limited number of observations are involved, it
provides greater accuracy.
&
It provides more detailed information. Since we deal with only few observations, intensive
detairs can be collected.
&
Sometimes sampling is the only method available. When the population is infinite or if the
articles are to be destroyed for testing the quality, census method can't be carried out.
Disadvantages:
&
We have only estimate of population parameter which may differ from the actual value of
the parameter but it is possible to calculate the sampling error or standard error which is
' '"-
&
\
It is the Standard Deviation of the sample means. If aNy one ...... ,is studied, ..,
I
'
~
Estimate: It is the value of population parameter obtained from the sample.
Estimator: one which is used to estimate the value of the population parameter.
Two types of estimate.
&
&
Interval estimate-It is an estimate in which population parameter lies between two values.
Notations used
Population
Sample
Characteristic values
Size
Mean
Standard Deviation
THEORY OF SAMPLING
Unbiased estimator: If Y is the population parameter and . Yl, y2 .... are it's estimates and
the mean of all possible estimates is equal to Y, then the estimator
unbiased.
i.e. an estimator
/I
is said to be
Y
/I
is unbiased if E
[~l
=Y
'?s.. Consistent estimator: When sample size n is increased indefinitely, the probability of th~
estimate
/I
to
be
close
be
close
to
1 when
Y -. Y )
N'
approaches
as n-.N
/I
/I
Y1
!
In'
/I
(i.e)P(
~
to Y will
v[ ~ J is< V[ ~ J
Y1
Y2
t..
.!~
Y2
/I
&
Linear estimator:
/I
i.e if Yl, y2 ...... are sample obseNations, Y= kl Yl + k2 Y2 + kJY3 + ........ + knYn where kl,
k2 ....... kn are constants
&
Used as an instrument / basis in testing the hypothesis. Standard error of mean indicates
the average variations in sample means from the population mean.
The magnitude of standard error gives us an idea about the unreliability of the sample. The
greater the standard error are greater the departure of actual value from the expected one
and hence greater the unreliability of sample. The reciprocal of standard error is taken as a
measure of reliability or precision of sample.
THEORY OF SAMPLING
Precision = _1_
SE
where SE
SO
~
as n increases SE decreases and so precision increases.
The more the replication, more will be precision.
(S._
With the help of standard error we can determine the limits within which
.:poptIlation
ii.
II.
'1'
2, Convenient sampling
3, Quota sampling
,'
Technique of drawing sample in such a way that each unit of population has an equal
THEORY OF SAMPLING
i"
lottery method :
This is a very popular method of taking a random sample and in this method all the
items of universe are numbered or named on separate slips of papers of identical shape or
size. These slips are folded in the same manner and dropped in a container or drum. They are
shuffled well. One slip after another is taken till the required sample size is obtained. The
number in slips must constitute the sample. Thus the selection of items depend entirely on
chance.
The lottery method is quite cumbersome when the size of the population is large.
2. .,: Using table of random numbers
The alternative method of random selection is using table of random numbers. The
following are some of the table of random numbers available.
1.
Tippets random numbers table conSisting of 41,600 random units grouped into 10,400
sets of four digited random numbers,
,
2.
._ Fisher and Yates random number table 15,000 random units assigned into 1500 sets
3.
Kendall and Smith random numbers table having 10 lakh random digits grouped into
25,000 sets of four digited random numbers.
I
I
4.
}\ Rand corporation table of random numbers consisting of 1 lakh random number digits
grouped into 20,000 sets of five digited random numbers.
5.
CR Rao, Mitra and Matthai table of random numbers consists of 20,000 random digits
-;,.,
>
Eg. It the population size is N = 30 and sample size n = 15.First number the population
from 1 to 30. Since 30 is a two digit number choose a two digit random number table. The
maximum multiple of 30 which is also a two digit number is 90.ln the two digit random number
table, choose 01 to 90 rejecting 91-99 and 00 so as to give equal chance to all units. Have a
random start in two digits random number table. Start anywhere, select that number if the
number is less than 30,if it is greater than 30,divide by 30 and take the remainder. If the
remainder is 0, that corresponds to 30, we have to repeat the process till we get 15 different
numbers (i.e. required sample size).The units corresponding to the chosen number constitute
the sample.
",
Estimation of point estimate and confidence interval for population mean
If Y1, Y2 ...... yn are the sample values chosen from the population of N units, then the
mean of the sample
y=
y'ty2 t ....... t yn
n
THEORY OF SAMPLING
, b"!
L yi
'j
':; .
i=l
n
Sample mean, y will be taken as estimate of population mean
1\
'
!T!,!,
..
'"
.'
."
-=
S,E( Y) is given by
-,\.,
1\
~~~~a~':~
S,E. ( Y)
=
n
n-1
When n> 30, 95% confidence interval for population mean Y is given by
Y+ 1.96
I.e.
Jt; r:
Y + 2,58
For small samples, if n < 30, then, 95% confidence interval for population mean Y is given by
Y+
t(n.l)
(5%) S,E, ( Y)
10
THEORY OF SAMPLING
Y+
t(n-1)
(1%) S_E_ ( Y)
When there is reaS0n to believe that the character under investigation is highly variatM. ,'.,
is._
When estimates are required for sub population also along with the population
------
Characteristics value
Sample
Population
Notations used
Y11, Y12,.
Y11, Y12,
It is more representative:
In simple random sample, some stratum may be over repeated while some others may
Greater accuracy:
Stratified random sample provides estimate with increased preCISion, Moreover
stratified random sample enables us to obtain results of known precision for each of the
stratum.
11
THEORY OF SAMPLING
3.
Administrative convenience:
As compared with simple random sample, the stratified random sample would be
concentrated geographically. Accordingly, time and money involved in collecting the data and
interviewing the individuals will be considerably reduced and supervision of field work could be
allotted with greater convenience.
Estimate of population mean and its confidence interval
y;.....
~,
Let
~ denote the sample mean of 1st stratum, 2nd stratum, ....... kth
stratum. Then Yl is the unbiased estimate of 1SI stratum population mean y~ is unbiased
estimate of 2nd stratum population mean and so on. In general Yi is the unbiased estimate of ith
stratum population mean
Therefore estimate of YI
Yi
/I
=
.
'
"'.
ni
i=1
wei~1fVN
,.,
YStRS
(i.e)
Nl Yl + N2 Y2 + ..... + Nk Yk
N
YSt.RS
N1 +
N2 +
+ Nk
V2 + ..... + Nk Yl
= Nl VI + N2
N
k
Ni yi
i=1
12
~::,':
.'
,.;>;'
THEORY OF SAMPLING
Estimate of.
Standfd ~QI' of
YSt.RS
where
Sj2
stratum
YSt.RS
St.R~ 1.96
is given by Y
Yslit . is given by
, I
1\
SE [
YSt.RS)
2.58 SE
YSt.RS
tn-1
(5%)
SE
YStRS
1\
YStRS
is given by
tn-1 (1%) SE
1\
YStRS
Systematic sampling
It is a type of simple random sample_ It is a commonly employed technique if the
complete and upto date list of sampling units is available. This consists in selecting only the
first unit at random, the rest being selected according to some predetermined pattern involving
regular spacing of units. Let us suppose that 'N' population units are serially numbered from 1
to N in some order and a sample of size In' is to be chosen, then let k = N/n, this k is called
sampling interval or sampling ratio. ,Have a random start by choosing a number less than k at
, random let it be p, then choose every kth item p, p + k, p + 2k, ........ and the units corresponding
to this numbers are the chosen sample.
Merits
~
It is more convenient than simple random sample, stratified random sample as time &work
involved is relatively less.
'&
Systematic sample will be more efficient than simple random sample provided the list from
which sample units are drawn is known.
13
THEORY OF SAMPLING
Demerits
~
If N is not a multiple of n, actual sample size will be different from that of required size.
It may lead highly biased estimates if there are periodic features associated with sampling
interval i.e. if the list has a periodic features and sampling interval is equal to or multiple of
that period.
Cluster sampling
In this case, total population is divided into some recognizable subdivisions termed as
clusters depending on the problem under study, and simple random sample of these clusters is
drawn. We can observe each and every unit in the selected clusters which is our sample. For
example, we are interested in obtaining income of a city, whole city is divided into N different
blocks or localities and simple random sample of blocks is drawn. The individuals in the
selected blocks determine the cluster sample.
Multistage sampling
Instead of enumerating all sample units in the selected clusters, one can obtain better
and more efficient estimators by resorting to sub sampling within the clusters. This technology
is called two stage sampling, clusters being termed as first stage sampling unit. This
technology is called multistage sampling.
For example, if we want to study consumption pattern of households in Tamil Nadu,
whole Tamil Nadu will be divided into different districts which is first stage sampling. A simple
random sample of few districts will be selected. Selected districts will be divided into villages
from which a simple random sample of villages is selected which is (ailed second stage
sampling units. The selected villages will be divided into households and few households will
be selected which is third stage sampling and here we get desired sampling units.
In every stage, some districts, some villages are left out. This is the disadvantage.
Non random sampling method:
1.
2.
Convenient sampling:
If the investigator chooses the samples at his convenience, it is called conv9nient
sampling.
14
THEORY OF SAMPLING
3.
.Quota sampling:
It is a type of judgement sampling wherein quotas are setup according to some
specified characteristics such as 'this much in the group', 'this much in other group' and so on.
Estimate of proportion and SE of proportion
1.
Estimate of SE of
SE (P) =
. I
,,
.'
n-1
where q= 1-p
n is the sample size
N is the population size
(5%) SE (P)
(n-l) d.t
I,
:;1
P . 1.96 SE (P)
A
SE (P) =
j N~n [~]..
where q= 10p
15
THEORY OF SAMPLING
2.
Then PSt.RS
SE (P SI.RS)
=
\
\... Ni
I.....
ni - 1 ./
1\
PSt.RS
+t
(5%) SE (PSlAS)
0-1
If n > 50,
1\
1\
SE (P stM)
Lwi2
[Ni~J [~J
NI
nl-
16
TESTS OF SIGNIFICANCE
TESTS OF SIGNIFICANCE
It is a statistical procedure followed to test the significant difference between statistics
and the parameter or between any two statistics. i.e. between sample mean and population
mean or between two sample means.
Hypothesis: Any statement made about the population
Null hypothesis: There is no significant difference between statistics and
Test Statistic: The Test statistic is some statistic that may be computed from the data
of the sample. It is to be pointed out that the key to the statistical inference is the sampling
distribution of the relevant statistic. General formula for a test statistic that will be applicable in
many of the hypothesis test will be as follows:
Test statistic =
Decision Rule: All possible values that the test statistic can assume are points on the
horizontal axis of the graph of the distribution of the test statistic and are divided into two
groups. One group constitutes what is known as the rejection region and the other group
makes up the unrejection region. The decision rule tells us to reject the null hypothesis if the
value of the test statistic that we compute from our sample is one of the values in the rejection
region and to not reject the Ho if the computed value of the test statistic is one of the values in
the unrejection region.
Significance Level: The decision as to which values go into the rejection region and
which ones go into the non rejection region is made on the basis of the desired level of
significance, designated by
Ct..
tests are sometimes called significance tests, and a computed value of the test statistic that
falls in the rejection region is said to be significant. The level of significance
Ct.,
specifies the
area under the curve of the distribution of the test statistic that is above the values on the
horizontal axis constituting the rejection region.
The level of significance Ct. is a probability and, in fact, is the probability of rejecting true
null hypothesis. We select a small value of Ct. in order to make the probability of rejecting a true
null hypothesis small. The more frequently encountered values of Ct. are 0.01 and 0.05.
17
TESTS OF SIGNIFICANCE
Calculation of the test statistic: From the data contained In the sample, we compute
a value of the test statistic and compare it with reiection and non-reiectlon regions that have
already been specified.
Statistical Decision: The statistical decision cons'lsts ot rejecting or of not rejecting
the null hypothesis. It is rejected it the computed value ot the test statistic falls in the rejection
region and it is not rejected if the computed value of the test statistic falls in the non rejection
region.
"1
(\. .". ,.
~=mmd,!._.:._ .\&mm
.". ''',
1%
1%
-;
--..-----I~~
Rtttct'OI1
(~lQ<\
NorHelt'tlofl
RttI'OI"!
'''9'Qrl
95
64~
~
-I
-v-----l ' - - - - - - , . - - - - _
regIon
flgure(\i)
Degrees of freedom:
Degrees of freedom of a statistic are the number ot independent observations in the
sample (n) minus the number of parameters (k), which must be estimated from the sample
observations. (Or)
The number of independent comparisons in the sample observations. (or) the number
of values that are can choose freely.
Different tests of significance:
Non parametric test
Parametric test
1.
2.
3.
Chi-square test
4..
Null hypothesis may be true but our test rejects it - which is Type I error.
2.
Null hypothesis may be false but our test accepts it- which is Type \I error
3.
Null hypothesis may be true and our test accepts it- which is correct decision.
4.
Null hypothesis may be false and our test rejects it -which is correct decision.
TESTS OF SIGNIFICANCE
Step 2 :
Test statistics
1\ ::'.,;:f'
Statistics - parameter
SE of difference
or
This is carried out when sample size is large i.e. n > 30.
When n > 30, it will follow normal distribution whose equa~.1
_(x.m)2
:::
l~
_lL
,12TTO' 2
.1
I 20' 2
where m
\
::: mean
, To test the significant difference between sample mean and population mean
I)
_ Step 1 t,
Ho: 1here is no significant difference between sample mean and population mean.
/ Step 2
Test statistics or ' Z ' statistics is given by
Z
:::
x- m
S.E. ( x- m)
:::
x-m
_0_
19
TESTS OF SIGNIFICANCE
x- m
Step 3 :
Conclusion
i) If
I Z I < 1.96, Z is not significant and we denote this as Z =( )N.S and Ho is accepted
ii) If
iii) If
) * and Ho is rejected
H() : There is no significant difference between sample mean and population mean
Step 2 :
x-m
2.1-1.95
0.24
>
',"
2.1 Kgs.
1.95 Kgs.
0.24
,;:._.
300
...
11.53'"
~
Step 3
II)
Step 1
TESTS OF SIGNIFICANCE
Step 2
is.E.( ~ -
iz)
O'l f + 0'2f
nl
Where
n2
n,
XI - X2
~1
Z=
$tep3
- 'I
i)
( )** and
Ho is rejected
If samples are taken from the same population, then the test statistics is given by
,:
.\
21
Test the
TESTS OF SIGNIFICANCE
Step 2 :
..fu.2 +_g2
0.72)2
45
IZI
=- 2.810
1.7 - 2.1
(0.558)2
35
(2.810r
Step 3:
I Z I > = 2.810 > 2.58
Z is highly significant. Ho is rejected
i.e. The two samples are different.
III)
and expected
proportion.
Step2 : Test Statistics is given by
difference in observed and expected proportions
E..::_Q
Where
Q= 1-P
n = no of trials
Example:
In a farm 120 calves were born in a year. Out of which 73 are female. Test the
hypothesis that the sexes are born in equal proportion.
i)
ii)
P =
73
120 =
0.6083
0.6083 - 0.5000
0.6083 x 0.3916
120
P -P
Z
0.50
Conclusion
I Z I > 1.96
I Z I is significant.
Ho is rejected
22
= 2.430'
TESTS OF SIGNIFICANCE
iv)
Step 1
P1 01 + P2 02
S.E. (Pl - P2 )
./
and
02 = 1-P2
No of trials.
where P= n1Pl+n2P2 when the proportions
n1+n2 from same population
,-
t.r
.",
a",
~-=.)-?
z
=
:z =
0.032 - 0.03
=.0147N.S
0.317x.96836(1/500+ 1/100)
I Z 1< 1.96
Z is not significant. Ho is accepted.
The treatment is not effective in contrOlling the disease.
23
~ OF SIGNIFICANCE
0'-.""",..
v)
To test the significance of an
'1...., coefficient.
Step1
Ho:lhere is no significant correlation coeificient.
Step2: Test S(a((s(rcs is given by
r-p
Z = S.E(r-p) where r is the sample correlation coetticiInt ar)d p lithe popufation
correlation coefficient. ~ . ,:
"" . '. '" "
' ","
'
r-p
z=
(1_p2) / ~n
Take p=O as we are interested in knowing whether 'r' is different from zero.
roO
then Z = (1-0)/ ~n
-1 ~n
::: r ~ n
--
~ (~
:-
Tables are available showing the critical value~ of r abovtJ ~ Ile given r is I
significant or not We have to see d.f:::n-2
Not~:
i:
This is the test to be carried out when the sample l:lize is less than 30.
is due to GOSSEn & he has published it in the pen name STUDENT
This distribution
':."
f (t)
I)
where C is a Constant
S!fp 1
Ho: There is no significant difference between sample and populatiof'} mean ..
TESTS OF SIGNIFICANCE
Step 2
Test Statistics or' t ' statistics
x- m
:: S.E. (x- m)
x-m
cr
::
F
m
x
cr
n
Population mean
Sample mean
S.D. of population
Sample size
,,
Step 3 Conclusion:
i) . U
ii)
ii)
If I t I >Table' l' at 1%Ievel with d.f = n-1, t is highly significant. t=( ) .. Ho is rejected ..
II)
r Ho is rejected.
::
Xl -
X2
= S.E( X1 - X2)
sf 1.+11
l n, n~
( n, -
1 )S,2 + ( n2 - 1 ) S22
n, + n2 - 2
X1
X2
S1
S2
25
TESTS OF SIGNIFICANCE
Note:
when
d.f=2(n-1)_.
Example:
In a certain experiment to compare two types of pig diets for which the following results
of increase in weight were observed during the experimental period. Can we conclude that diet
1 is better than diet 2.
Diet 1 : 52 53
57
52
49
52
50
Diet 2 : 46 51
42
50
55
54
Ho - There is no signif~ant ~ifference between two diets
X1 = 52.142 xF49.66
=
S1 =2.54
S2 =4.92
.~
6 X2.542 + 5 X4.922
::::
6+72
l,
1,170
=14.54
52.142 - 49.66
1:1 ,'.
::
14.54
7
i) r~:.
t 1 < 2.201
III)
.am.,...
two observations.
Step 1
Ho:There is no significant diffetlnce between two samples.
Step 2
Test Statistics
i)
_d_
=S.E. ( d) where d is the difference in the observations of two samples,
TESTS OF SIGNIFICANCE
...L
d is the mean of d.
s is the s.d of d
Step 3
R
a et0f Iean rIssue growth
Before giving Ration After giving Ration
420
600
490
720
340
350
560
530
670
850
540
580
530
690
590
730
_d_
121.25
s = 60.47 = 3.4313". d.f= 7
-rS
He is rejected
The ration is responsible for increase in rate of lean tissue growth.
IV)
Step 1 : He
27
TESTS OF SIGNIFICANCE
Stap 2 .
r- 0
S.E. (r)
~n-2
The coefficient of correlation of the body weight &feed in take of 15 broilers was 0.75
test the significance of the correlation.
Ho:
O.7~
=
~ 1 - 0.75 2
= 13
I t I> 3.012
I t I is highly significant
~ \ >-::. \ l,..<J~~>** \.~.
Ho
V)
y =a + bx
b-O
t
S.E. ( b - 0)
_Q
a.E. (b)
S,. (o)
where
(n - 2) Sxx
~r.. Si
,
J
'Sxy
LXY -(LX)(LY) / n
with d.f=n-2
Stl3p 3: Conclusion
same as previous with respective d.f.
28
TESTS OF SIGNIFICANCE
e
r(k/2)
prQ~erties of X 2
1.
2.
3.
Note:
If X1 2, X 22, X 32....... Xn 2 are i'ldependent X2 variable with d.f. k1,k2,k3 ..... kn, then the
sum o~ X2 variables i.e.) X,2+X 22+X 32+....... +Xn2 will be a X 2 variable with d.f =k, + k2
+........ +kn
If Z is a standard normal variable Z2 is a X2 random variable with d.f =1
Combining 1 & 2 ,if Z1,Z2 ...... Zn are In' standard. normal variables, then
Z12+Z22+ ....... +Zn2 will be a X2 variable with d.f. = n
If X is normal variable with mean 'm' &S.D. '(} I, then the standard normal variable is
Z =X-m
(}
4.
If X1,X2,X3 ............ Xn are In' independent normal variables with mean m',m2 ....... mn
& Variance 0,2, (}22 ..... (}n 2then Ii (Xi-mi)2 will be a X2 variable with d. f = n.
(}j2
5. .
If X1,X2 .............. Xn is a random sample from a normal population with mean m and
variance (} 2then In (Xi-m)2 is distributed as X2 with d.f =n
i=1
6.
(}2
If X1,X2 ........ Xn is a random sample from a normal population. With mean 'm' &
variance (}~hen the sample mean X & sample standard deviation (} are independent
& In (Xj- X)2 is distributed as X2 with d.f = n-1
j=1
(}2
The purpose of this test is to determine if the observed set of data fits an expected set
of data. This test will help us to determine whether the sample results are consistent with the
hypothesis that they were drawn from a population with the known distribution i.e. uniform
distribution, normal distribution, Poisson or normal distribution.
The uniform distribution is continuous distribution which assumes that all possible
values are equally likely. It is also used to test whether the observed frequencies are in a given
ratio. In genetic experiments, we used to test whether the values are according to Mendalian's
ratio (9:3:3: 1)
29
TESTS OF SIGNIFICANCE
Step 2
Test statistics
Step 3 : Conclusion
If IX2 I < Table'x2 'at 5%level for the respective d.f, X2 is not significant X2 = ( ) N S. Ho is
accepted.
~ If IX2 I .> Table 'X2 ' at 5%level for the respective d.f" X2 tis significant. 'X? = (
Ho is
rejected.
(:s. IX21 >Table 'X2 ' 'at 1%level for the respective d.f, X2 is highly significant.x2 =(
) .. Ho is
rejected.
&
r.
Example
In an experiment to determine the preference of various rations, a random sample of
60 steers were introduced to 5 different rations. The animals have free access to 5 different
rations. The preference for a ration was measured by number of animals eating from a
particular bin at anyone feeding time. The observed results are displayed below.
Ration
No of steers
13
14
11
13
L (O-E )2
E
=
Calculated X2
(1.32)NS
< Tab.x2 at 5% level
Note: If anyone of values are less than 5, we have to group them and then" flave to do'
the test.
X2 test of independence
In this test our interest is focussed on independence between classified item according
to the two different criteria by rows and columns.( i.e.) to test the hypothesis that the
classification represented by rows and columns are independent or associated.This test is
referred to as the Chi-square test of Independence or Contingency test.
In case of pair of variables, the association can be studied by correlation coefficient.
But in case of attributes, we have to classify according to the contingency table and test for
30'
TESTS OF SIGNIFICANCE
independence. When the results are classified according to different criteria in a r x c table in
the following manner.
2
0 12
022
- j
- Oli
- - - Oij.
- - -
1
011
021
Oil
Oi2
r
Total
02 1
c
OIC
02c
Total
Rl
R2
Oic
Ri
- Orj.
Rr
0r2
Orc
Orl
N
Cc
C2
- C1
Cl
The d.f is given by ( r -1 ) (c-1 ) i.e( The no of rows - 1 ) x ( No. of columns - 1)
:::
Eij
:::
(Oij - Eij)2
with d.t.(r..1)(c1)
Oij
:::
Eij
:::
Product of row total and column total in which it lies dl'l!ded by Grand Total
Eij
:::
Ri x Cj
N
If some of the values are less than 5, then we have to use' Yate's correction which is
given by adding 0.5 to the value which is less than 5 and subtracting 0.5 in some other cell to
make the total same. In general if d.f ::: 1 & N is small, Yate's correction is needed. For small
samples, when the expected frequency is between 5 and 10, it is better to compare both the
corrected and uncorrected X2 .If they both lead to a different conclusions, one can either
increase the sample size and if it is impractical to increase the sample size we have to follow
exact method of probability involving multi -nominal distribution.
In th e case 0 f 2 x 2 X'2con t'Ingency ta bl e 0 f th e f0 II oWing
. orm
Factor A
Total
Levell
Levell!
Levell
a
b
atb
Factor B
Level II
c
d
ctd
Total
atbtctd:::n
atc
btd
Step 1: Ho: the factors are Independent
Step 2 : X2 is given by
31
TESTS OF SIGNIFICANCE
(ad-bc)2xn
(Iad - bc I -n 12}2 x n
j.
It is defined as
~N+X2
~
V~
N
Coefficient of contingency
Where 0 2 =
Let C
Coefficient of
0 2
1+0 2
Contingency
VC
X2 test of homogeneity
It is an extension of X2 test of independence.X2 test of homogeneity is designed to
determine whether two or more independent random samples are drawn from the same
population or from different population. Instead of one sample as we use with independent
problem, we shall now have two or more samples. For example, we may be interested in
finding out whether or not university students of various level (Le) U.G, P.G. & Ph.D. feel the
same with regard to the amount of work required by their professors, i.e. too much work,
correct amount of work & too little work. We shall take the hypothesis that the three samples
come from the same populations. (i.e.) the three classifications are homogenous so far as the
opinion of three different groups of student about the amount of work required by their
professors is concerned. This also means that there exists no difference in opinion among the
three classes of people.
The test statistics used
f~r
The test of independence are concerned with the problem of whether one factor is independent
TESTS OF SIGNIFICANCE
of another while test of homogeneity are concerned with whether different samples come from
the same population. Test of independence involves a single sample taken from one
population but the test of homogeneity involves two or more independent samples taken from
each of the possible population in question.
Example: A firm selling four products is interested in finding out whether the sales are
distributed likely among four general classes of customers. A random sample of 370 sales
provides the following information.
Products
1
25
32
35
28
120
Farmers
Factory Workers
Business Men
Professional
Total
2
10
20
48
22
100
4
15
28
10
17
70
3
60
10
25
15
80
= 44.75
Tab X2 at (r-1) (c-1) d.f
(44.75)**
X2 is highly significant
X2
" I r
Ho is rejected
~ \
--
t IF) =
,
In, + n, I
2
l
~
]n1 + n2
+ ~~F
33
Total
80
90
118
82
370
TESTS OF SIGNIFICANCE
Variables .
F is the ratio of two X2 variables(Le) if' U ' & 'V ' are two X2 variables with d.f n1 & nt ..
I
V/n2
"/
,',
'1'
, F 'distribution =
, F ' is used to test significant difference between variances of two samples, or more.
The object of ' F' test is to discover the two independent estimates of population
variances differ significantly or whether the two samples may be regarded as drawn from the
normal population having the same variance.
, F ' Test Procedure
Step 1 : Ho:
Step 2
Test Statistics
Where
Note: We have to take the greater variance of the two samples in the numerator
NUN - PARAMETRIC
TESTS
The assumptions regarding populations are less restrictive and so are applicable to a wider
range of conditions.
Demerits
~
Because of their simplicity, an easy computation using small samples, they are sometimes
used for convenience where large sample ¶metric methods may be appropriate.
When the data do not meet the assumption required for a parametric tests.
When we know that the data gathered are from a population which is not normally
distributed, it is appropriate to use a non-parametric tests.
SIGN TEST
This test is performed, when we wish to analyse two sets of data which are from the
same samples or dependent or occurs in pair. It depends on the sign of the difference within
\ the paired observation, the stepwise procedure is as follows.
Let Xi &Yi are observations of two samples
1.
Examine the pairs of observations in the two samples, (i.e.) (Xi, Vi) i=1, 2........... etc.,
J
2.
<
3.
4.
35
NOII-IWfAMETRIC TESTS
5.
Denote the number of pairs remaining in either a +ve or -ve signs by , n '
6.
Denote by 'r' the no. of pairs with less frequent sign occurs.( either tve or -ve)
7.
To test the hypothesis of no difference between the effects of two sets of data,
compare 'r' with the critical values in the table corresponding to degrees of freedom = n
8.
26
24
25
22
18,
30
26 '. 28
Ration B
18
20
24
20
20
20
24
Sign of the
t.
.',
26 ;,
t
26
26
27
24
difference
n = 10
~,
"<'.,\,
d.f
=10
tab r = 1
tab r = 1
Cal r > tab r
Hence the hypothesis is not rejected
Hence the two rations are not different.
....
modulus) of the differences. The smaller difference is given rank 1 and ties are
assigned average ranks.
2.
3.
Denote by T the absolute value of the smallest of the two sums of ranks found in
I
previous step.
To test the hypothesis of no difference between the effect of two treatments compare'
T with table value of T.
I
- 6.
I,
Case 2:
Two random samples come from the population having the same
distributionl~
Case I
1.
List the observation in the order in which they were obtained. (i.e) in the order of
occurrences
2.
Determine the sample median. Determine the observations below the median by -ve
sign & observations above the median by +ve sign.
3.
Denote the no. of -ve signs by n1 and the no. of +ve signs by n2
4"1
Count the no. of runs & denote this no by r in terms of our symbols ,a run is the
sequence of signs of the same kind bounded by signs of other kind)
5. '.
If cal r ::; critical value of r in table for chosen significant level. Hypothesis is rejected
otherwise it is not rejected.
Case 2
1.
List n1 + n2 observations from the two samples in order of magnitude. ie) in absolute
order.
2.
3.
37
'&.
When no. of variables is more than two, the relationship between the variables is either
partial or multiple
'&.
In partial correlation, we measure the correlation between a dependent variable and one
particular independent variable, when all other variables involved are kept constant.For
example, in the study of broilers, weight of broilers, feed intake, labour used, medicinal cost
etc. are the variables taken for study. We study the relationship between weight of broilers
and feed intake eliminating the effect of other variables, it is partial correlation.
It we denote the dependent variable by , Y , & independent variable by X1,X2,X3 ...... etc.,
The partial correlation between Xl &X2 keeping
which is calculated by
r12.3
r12 - r13
r23
r'2.4
correlation
The second order partial correlation is given in terms of first order partial correlation
coefficients.
or
"2..34
(1 - r13.4 2) ( 1 - r223.4 )
'1'"
Significantly to ' y ,
&If ryl(23 ... ) = (0.72 )**, then the contribution of Xl singly to' y , is significant and due
to influence of other variables Xl is not contributing significantly.
MULTIPLE CORRELATION
When we try to study the relation between two or more variables with one variable( i.e)
t'~(}
R y ( x1x2x3 .......) or R y (1234..). It actually measures the combined influence of all the .
irJdependent variables with one dependent variable.
R 1 (23)
R 1 (2,3,4)
R is given by the correlation coefficient between th., .~ .....pf "y' &-.expected
value of y (1)
COefficient of multiple determination
Square of multiple correlation( R2 lis defined as the
determination or Coefficient of Determination
I
Coefficient of multiple
If R2 is very low or R2 is Non -Significant, it means that we have not taken in our study
the characters which influence significantly on the effect of y
I
COefficient Of Non-Determination
.1
.
It is given by
1 - R2
If R2
1\9)
22% at variation in I y I is due to other variables which are not taken in the study.
0.78 ,then 1 - R2
0.22
C()efficient Of Alienation
Square root of (1-R2) is defined as coefficient of alienation.
MULTIPLE REGRESSION
Multiple regression analysis enables to measure the joint eHect of any number of
Xn)
L ( Y-
Y)2 is minimum.
(1)
(2)
(3)
L xny = ao L Xn + a1 L X1 Xn + a2 L
X2Xn
The above equations are known as normal equations. The equations can be solved by
elimination method or by matrix method or using determinants (Cramer's rule)
Matrix Method
In matrix form, the normal equations can be written as
AX=B
.;
......."
Where A=
"'-.,
'
40
X can be obtained from the above matrix equation,then substitute in the multiPle regfession
equation
Y=ao + al Xl + a2 X2 + ...... + an Xn
Calculation of R2 and standard' Error of Estimate (S2)
Total sum of squares (TSS)
Ly2 -
~2
=
IX2 Y = LX2 y - ~:X2 LV
N
LX n Y =LXn Y- LXn LV
N
Sum of squares due to regression (RSS) =al LX1Y + a2
N-k
RSS
TSS
Significance of R can be seen from table with d.f. = N-2 (or)
!
Using F test. The F statistics bein9 F =
R2j k
with k, (N~-1 )d.f.
(1-R2) j N - k - 1
Significance of partial regression coefficient (ai's)
The "t' statistics is given by
T=
ai
with d.f ~ N - k - 1 where ok' the no. of independent variable N is the
s --Jcii
total no. of observation
41
If R2 is insignificant, it means that the independent variables (xis) studied are enough
to explain the variation in the dependent variable (y) (i.e) if R2 = 0.65, it means that
65% of the variation in the dependent variable is due to the independent variable
studied.
2.
If R2 is not Significant, it means that the independent variable (Xi's) studied are not
enough to explain the variation in y.
3.
If the partial regression coefficient (ai) is Significant, if means that by one unit increase
in xi,we will have an increase of ai units in y.
,
4.
If the partial regression coefficient (ai) is not significant, it means that by increasing one
unit of above xi we are not getting significant increases in y.
model
Where o-? denotes the variance of Xi. The quantity a?ai2 measures the fraction of the
variance of Y attributable to its linear regression on Xi. With a random sample from this
population, the quantities.
a? (_LxNLy2) are sample estimates of these fractions. (In small samples a correction
for bias might be advisable since ~i2LX?!Ly2 is not an unbiased estimate of ai2ai2!Ly2)
The square roots of these quantities, ai -lLX?!Ly2, called the standard partial
regression coefficients, have sometimes been used as measures of relative importance, the
X's being ranked in order of the sizes of these coefficients (ignoring signs). The quantity
-lLX?!Ly2 is regarded as a correction for scale. The coefficient estimates aiaJay, the change
in Y as a fraction of cry produced by one S.D change in Xi
42
PATH ANALYSIS
It is a technique to study, the direct & indirect effects of independent variables on the
dependent variables.
The direct effects are the contribution of independent variables on dependent variable
on its own & indirect effects are contributions of an independent variable in association with
some other independent variable.
In general if we have' "n' independent variables as Xl. X2 ,X3 .......... Xn and the
dependent variable as Y and if the direct effects are denoted by P11 , P22. P33 ......... Pnn, then we
have the following 'n" equations.
Pll rl1 + P22 r22 + P33 r13+ ................ + Pnn rln =
rly
r2y
rny
Pij
Pjj rij
Indirect effect of 1sl variable via second variable (P12) = P22 r12
Indirect of 1st variable via 3rd variable(P13) = P33 r13
Path analysis is a technique that uses the linear regression models to test specific
theories of casual relationship among a set of variables. Path analysis involves looking not only
for relationship among variables, but also for casual relationships. Two variables that are both
casually dependent on a third one will themselves are associated.
;.
::.'~:.
1~..
iL":>~ I .:.......~,. .
:!
_______
(, ','
' "-
. , . ,"
i .. >::'.:'
1/.'_..._ _
_ _ _
. _.. _ . I
' .. ,II\~
oo
oo;rol)::II'
....... :
\
. ::..
\ In.\
:" I'....
---
... 4
.. 1.:
--_._'
\
:11
I.'
II',
.lIh.llt. I
.. n~ ~ ~"a03".H' .. h .. n
Jh.c \)1
~.
/.!
Fig: X and Yare casually dependent on Z, so the association between X and Y dis
appear when Z is controlled. As shown in fig, if the association between two variables
disappears under control, one can conclude that there is not a casual relationship between
them. If the association does not disappear, though, we cannot necessarily conclude that the
relationship is casual, since the relationship could disappear when other variables (perhaps
unknown to us) are controlled. Thus we can prove noncasuality but we can never prove
casuality.
Path Diagram:
In developing theoretical explanations of cause-effect relationships, we might
hypothesize a system of relationships in which some variables believed to be caused by other
may intern have effects on other variable. Thus, a single multiple regression model may be
insufficient for that system, since it can handle only one dependent variable. Path analysis
utilizes the member of regression models necessary to include all proposed relationships in the
theoretical explanation.
For example, suppose that our theory specified that one's educational attainment
depends on several factor, in particular upon one's parent's income level, one's intelligence
and one's motivation to achieve we might hypothesize, in addition that one's motivation to
achieve depends on several other (prior) factors-among them, the parent's educational level
and the student's general intelligence level; that the income of the parent depends in part on
the parent's educational level; and that educational attainment may also depend directly on the
student's intelligence.
I
Fig. shows a graphic summary of the theory just outlined, in the form of "a path diagram"
Fl"t'lI:i
"_,.",p\~ of
prt\lf!l\I)'f'
ploth <h.t,rutl
44
,trength of association between variables, controlling for other variables in the sequence, and
he sign of influence. Their interpretation is simply that of multiple regression b* coefficients. A
me standard deviation change in the independent variable corresponds to a b* standard
jeviation change in the dependent variable, controlling for the other independent variable in
hat particular regression equation.
An unmeasured residual variable path is usually attached to each dependent variable
n the path diagram to account for the variation unexplained by its independent variables. Each
'esidual variable represents the remaining portion (1 - R2) of the unexplained variation in its
~orresponding
regression equation with that dependent variable. Their path coefficient equals ~1- R2. Every
dependent variable will have a residual path associated with it. It is assumed that the residual
factors are uncorrelated with the other independent variable in the system, and with other
residual associated with other dependent variable in the system.
Most path models will have variables that are dependent on some other variables but
are, in turn, causes of other dependent variables. These variables are sometimes labeled
intervening variables since they occur in sequence between other variables. Thus in the
example, the child's achievement motivation intervenes between child's educational
achievement and child's intelligence. This means that, if the theory is correct, the child's
intelligence affects his or her educational achievement in parts its effect on achievement
motivation. Thus, it 9ffect in this sense is indirect. However, the model also proposes that the
child's intelligence have a direct effect on his or her educational achievement over and above
the effect through achievement motivation. By performing the regression analysis, we can test
whether it is true. For example, if intelligence affects educational attainment only through its
effect on motivation, then the direct path (controlling the motivation) will have a nonsignificant
path coefficient. However, if intelligence works both directly and indirectly, then all three
coefficients of parts of paths leading from intelligence to educational attainment should be
significant. If we do find a nonsignificant path, then we can erase that path from the diagram
and perform the appropriate analysis again to reestimate the coefficients of the remaining path.
In fig, it has been shown path coefficients, which is the form seen in research literature.
The residual variables for the three dependent variables are denoted by R1, R2 and R3' If 28%
of the child's educational attainment were explained by its three predictors, for example, then
the path coefficient of the residual variable R1 for the child's educational attainment would be
v'1-R2 = v'1-0.28 = 0.85. It seems from fig. that of the three direct predictors, the achievement
motivation of the child had the strongest partial effect on his or her educational attainment
(Controlling for the child's intelligence and parents income). The child's intelligence has a
moderate indirect effect, through increasing achievement motivation, as well as a direct effect
on educational attainment. The parent's income is not as important as the child's achievement
45
motivation or intelligence in determining the child's educational attainment, but the parent's
educational level has an important effect on the child's achievement motivation. Of course such
conclusion would have to be weakened or modified if there were substantial sampling error for
the path coefficients.
In summary, the basic steps in ,a path analysis are as follows:
1. Set up a preliminary theory to be tested, drawing the path diagram without the path
coefficients.
2. Do the necessary regression modeling to estimate the path coefficient and the residual
coefficients.
Evaluate the model perhaps erasing nonsignificant paths and recalculating path
coefficient for the new model.
48
DESIGN OF EXPERIMENT
DESIGN OF EXPERIMENT
Design of experiment means planning an experiment. It may be defined as the logical
construction of experiment in which the degree of uncertainly with which the inference is drawn
may be well defined.
The subject matter of the design of experiment are
is..
Planning of experiment
is..
Obtaining relevant information from it regarding the statistical hypothesis under study.
is..
Experimental unit
The smallest division of experimental material to which the treatment is applied, and
make observation on the variable under study is termed as experimental unit.
Eg:-In a nutrition experiment, a group of pigs in a pen
Blocks
In an agricultural experiments, we divide whole experiment unit into relatively
homogeneous subgroups or strata. This strata which are more uniform among themselves than
the field as a whole are known as blocks. In animal husbandry experiments breed can be taken
as blocks.
Yield or response
The measurement of variable under study on different experimental units are terrnElii
as yield (ie) it is the outcome of experiments.
'!
Experimental error
It is the unit to unit variation within the same treatment group and this is the measure 01
variation due to uncontrollable Dr unassignable causes.
DdION OF EXPERIMENT
to yield identical
'
results.
Replication
2.
Randomizatior
3.
Local control
Replication
It refers to the number of repetition of treatments. It means execution ot treatments
more than once. In other words, the repetition of treatments under investigation is known as
replication
An experimentor resorts to replication in order to average the influence of the chance
factor on different experimental units. Thus the repetition of treatments results in more reliable
estimate than is possible with a single observation.
Advantages:
&
It serves to reduce the experimental error & thus enables us to obtain more precise ,.'
estimate of the treatment effects. We know that standard error of the mean of the sample
of size' n ' is
S.D.
s.E=f
The precision of a design is given by j_ As' n 'increases. S.~,(fecreases and hen6e
precision increases.
(:s.
S. E
/."
: '.'
(3.
DESIGN OF EXPERIMENT
Local control
(S_
In Randomized block design ( RBD ) there is local control in one direction with one criterion
In Latin square Design ( LSD), there is local control with two criteria in two directions.
Besides these three principles we have two more principles in any experimental design
which are:
1.
Auxiliary Var'lable
2.
Control
DESIGN OF EXPERIMENT
Auxiliary Variable:
In any experiment there may be some initial variables which may influence the
response of our experiment. For example, in weight gain studies, initial weight will be the
auxiliary variable. We have to choose all the auxiliary variable and record their values before
applying the treatments.
Control:
applied.
,-".
-'.,
'
, .,:
(3_
(3_
'~I..
I' :
The design is very flexible since we can use any number of t(ea~." any n~i~
replication without complicating the statistical analysis,
" \.
'
The statistical analysis remains simple if some or all the observations for any treatments
are lost, we merely carryout the statistical analysis with the available data. Moreover the
loses of information due to missing data is smaller in comparison with any other design.
Disadvantages:
(3_
If experimental units are not homogeneous, the error variance will be more which makes
the design less efficient and results in less sensitivity in detecting significant effects.
Applications:
/
1.
2.
DESIGN OF EXPERIMENT
~tistical
Analysis of CRD
= fJ
Yii
+ ti +ej
Where Yij - Response value of ' i ' th unit receiving 'i 'th treatment J.l - General mean
effect ti
eij = Error effect due to chance such that eij are identically and independently distributed (Li.d)
normally with mean =0 and variance cri 2which is written as [ i.i.d with N ( 0, cr i2 ) J
Let us consider the case of C.R.D. with 't' treatments which is tabulated as follows
Tr1
Tr2
Tri
Trt
Y11
Y12
Y1i
Y1t
Y21
Y22
Y2i
Y2t
Y31
Y32
Y3t
Y3i
:,.j1
:~/,
/
Y n11
Yn22
Ynii
T1
T2
Ti
Yntt
Tt
:::;
.,; I
:::;
i
Total sum of square [ T.S.S.]
= ~ ~ ( Yii - G )2
i
:::;
i j
:::;
i j
:::
:::;
'
..
i i i
Step3
Find Grand total G :::; T1 + T2 +......... + Tt
DESIGN OF EXPERIMENT
G2
C.F =
2)
N
Calculation of sum of squares.
i)
ii)
T22
+ n2
n,
Ti2
T12
+ ........... + nj + ............ t nt
iii)
C.F.
T.S.S - Tr.S.S.
Step 4 ,
FormatiOn of ANalysis Of VAriance table (ANOVA or AOV table)
ANOVA
Degrees of
freedom
Sources of
variation
( d.t )
t -1
Between
Treatments
Sum of square
S.S
Tr.S.S
I'
Within
Treatment
(Error)
N- t
E.S.S.
Total
N -1
Error d.l
=
=
..
Mean
SS
Square= d.f
Tr.S.S
Tr.M.S= - t-1
E.S.S.
EMS=
N-t
,....
Tr.M.S
F=
-EMS
T.S.S.
Step 5:
Interpretation
52
DESIGN OF EXPERIMENT
Case3. If cal F> Tao F for d.f.= ( t -1 ),( N -t ) at 1% level. F is highly significant.F=( )** Ho is
rejected.
In case 2& 3, When we are rejecting Ho we have to test the significance of different treatments
among themselves, we have to work out Critical differences (C.D) between any two treatments.
SE of difference between x
any two treatments.
jEMS
\}
rlnl
~+ ~ x
nJ
at
1
nj
T1
,Tr1 = n1,
ie)
Tr2
., ............. Tn =
.......... Tri =
= ,
Tt
nt
and find the difference between mean of the treatments and if it is greater than C.D. declare
them as significant and if it is less than CD declare them as non significant.
Bar chart representation
1'...
Write treatment means in ascending order &draw bar above the treatments, Which dQnot
.
differ significantly.
Note: In CRD, if replications are equal ie n1 = n2 = ......... = nt=n(say)
Tr.S.S. =
Tr.S.S. =
- C.F.
C.F
,
-'
J2E~S
:;:)
.t.j
DESIGN OF EXPERIMENT
In randomized block desic.n, treatments are allocated at random within the units for each
block. Also variation among blocks is removed from variation due to error. Hence one
source of variation is contrulled by stratification and experimenter prefers randomized block
design than completely rarjomized design.
Disadvantage
(3.
When data from some individual units are missing, we have to use missing plot technique
and estimate the missing values and then carryout analysis of variance (AN OVA ).If the
missing observations arb more, then this design is less convenient than completely
randomized design.
Disadvantage
a In each block we must have experimental unit equal to the number of treatments or
multiple of the treatments. If we have It' treatments and' b 'blocks, total experimental units
needed are.
N =
~ The efficiency of the design decreases as the number of treatments and ~e block size
c.
increases.
"
l-l + ti + bj + eij
Where Yij - response values of experimental units for illl treatment and in ph block.
General mean effect
DESIGN OF EXPERIMENT
ti
bj
ejj
Trl
Y11
Y21
Tr3 ......
.. ....... Trt
Tn.
........Y11
........ Y21
Yii
Y2i
Y12
Y22
Total
B1
B2
... ................
...................
.. ...... Yit --
Yij
Yj2
Yj,
...........
','
Bib
Ybi
Yb2.
Yb1
Bj
-"
!l)4~l
........ Yb\
Bb
........TI
G.T.
Total
i)
ii)
iii)
........TI
T2
T1
.-"'"
iv)
= G2
b xt
Step3
Calculation of Sum of square
i) Total sum of squares(T.S.S) = Sum of square of response values of all 'N' experimental
Units - C.F
= LLYij2-C.F /'
: 'I
(Tr.S.S)
-C.F'
55
DESIGN OF EXPERIMENT
Step 4
Formation of ANOVA Table
Source of
variation
Between
treatments
Degree of
freedom (d.f)
t-1
Between Blocks
b -1
B.S.S
BSS
BMS = b - 1
(t-1 )(b-1)
E.SS
ESS
EMS::: (t-1)(b -1)
bt -1
TSS
Sum ofSquare
(S.S)
Tr.S.S
Mean Square(M.S )
Trss
t-1
TrMS
EMS
TrMS :::
Error
Total
=
=
=
Error d.f
BMS
EMS
Intetpretation
I)
1. ..
i)
ii)
. CaIF>TabFat5%d.f=(t-1),(t-1 )(b-1)
F is significant F = ( )* Ho is rejected.
iii)
Cal F > Tab Fat 1% d.t= (t-1), (t-1), (b-1) F is highly significant F = ( )** Ho ..
is rejected
i
Critical difference
between any two treatments at
~
~
b
5% or 1% level
1% level
at"' or,
Workout the treatment means and if the difference between and two treatments means
3
.,.;
is less than the critical difference, they do not differ significantly &if it is greater than
critical difference C.D., they differ significantly.
II)
i)
Cal F
. 'i6
DESIGN OF EXPERIMENT
Cal F> Tab Fat d.f ( b-1), (t-1), ( b-1) at 1% ' F ' is highly significant, F =( )"Ho is
rejected
iii)
cb.
than critical difference then the blocks do not differ significantly & if it is greater than critical
difference, then the blocks differ significantly.
Note:
"&
If there is no significant difference between the blocks, we have not gained any thing
by
using RBD. We have only lost the block d.t =b -1 in the errord.t
"&
If there is significance for both blocks and treatments ( or) Non-significance for both blocks
and treatments. Then the interpretation is valid one.
1. ree: tm~nt
We have to subtract 1 d.f from Total and subsequently Error d.f will be reduced by 1
57
Critical
}
Difference
VI
EMS
}
=
J t;EMS
_OF EXPERIMENT
;e '
+
b(t-1)(b-1)
[2- :
+
b(t-1)(b-l)}
If two values are missing. Say' X ' & ' Y' we have to guess a suitable value TOr one 01
them say 'X' & then we have to calculate 'Y' then taking that value for 'Y' we have to calculate
'X' & repeat the process, till we get two closer values for anyone of the unknown values of
'X','Y'.
Here we have to subtract
(B - ( t - 1) X )2
t ( t - 1 )2
m- (t -1) y)2
.. frOm Tr.8.S .
, :~. r'
t ( t - 1)2
'"
If the blocking has been effective in increasing the JQCIlt...\~ ';"~, the
relative efficiency will be greater than 1. The quantity
'.'
..:'
.
(R.E -1)
When the available experimental material are known to be subjected to two major
sources of variation, the experimental units are grouped according to these two sources of
variation, so as to have a two way elimination of variability in the experimental units. The
double groupings (double blockings) are called rows & columns .In each row & each column
every treatment is applied once. This leads to an arrangement of "t' treatments in a square of "t'
rows & 't' columns such that every treatment is allotted once in every row & once in every
column. Such a design is called Latin Square design (LSD). The number of experimental units
for a LSD of "t' treatments is txt (i.e) t2
With 5 treatments the number of experimental units needed is 25. With six treatments
the number of experimental units needed is 36.& so on. When more experimental units are
58
DESIGN OF EXPERIMENT
required for the experiment, then the experimental units are likely to be heterogeneous & also
the allotment of treatment is complicated. For treatments of less than 5 the error d.f will be less
(it is advisable to have error d.f. minimum 12 for a valid conclusion. In general LSD is adopted
for treatments from 5 to 8
Statistical Analysis
The response values
'Yijk' corresponding to 'i' lh treatment in '}'lh row & 'k' lh column will be of the form
Yijk
11 + ti + rj + Ck + eijk
Where. 11- General mean effect ti,rj,ck are the effects due to treatment, row and column
& eijk - is the error effect which is identically, independently distributed normally with constant
variance Cii i.e LLd -N(O,Cie2 )
Randomisation in LSD
The allotment of different treatments to the units without repetition of any
treatment in any row or any column should be done as follows.
~
Get a random Latin square of required size '1' x 't' where 't' no. of treatments from table XV
of the statistical tables for biological, Agricultural &medical research by' Fisher &Yates'
&
The choice of random Latin square is desired by the random number which is less than the
total no. of available square in the table
&
Number the columns 1,2 ..... , 't' and get a random arrangement of the numbers 1,
2, ......... ,t. Accordingly change the columns
&
Keeping the first row fixed, number the remaining rows, 1,2,3 ............ (t -1), then
accordingly change the rows .
./
The latin square so obtained is a latin square in which the treatments are allotted
randomly to the experimental units. This treatments are applied on the experimental units
which have been grouped according to two way variation among the units.
2.
3.
Column totals
4.
Grand total(G) = sum of row totals (or) Sum of all column totals(or) Sum of all the
treatment totals ( or) sum of all the response values of experimental units.
59
, . . " OF EXPERIMENT
5.
TSS
'i!
LLL
.:-:
y2ijk
C.F.\-
ij k
2.
~1'.V:'. '
t
3.
5. ..
"
;(.~
4.
- C.F.
,'-
..
;<.i
C.f.'/
. i "
. ,: .
Degrees of
freedom (d.f)
t-1
!
I.
Rows
Columns
Sum of
square(S.S)
TrSS
;~
:y),~: ~
t -1
R.S.S
t -1
C.S.S
(t-1 )(t-2)
E.S.S.
12-1
TSS
:::;.\\
Error
it
Total
Error d.f
Mean
square(M.S)
TrSS
t-1
RSS
t-1
..
C.SS
T -1
ESS
. ~-1l(t-2J
F
TrMS
EMS
RMS
EMS
CMS
ESS
.!
i
I'"
I. '
":1 I .,.
Total d.t- (treat d.t + row d.f + column d.f) = t2-1- (H) +(H) + (H)
(t-1) (t-2)
Step 5
Interpretation:
1.
If Cal F< Tab f ( t-1 ),(t-1) (t-2) at 5% level F is not significant. F = ( ) N.S Ho is accepbld
2.
If Cal F> tab F d.f ( t -1), (t-1) (t-2) at 5% Fis significant F = ( )* Ho is rejected
3.
If Cal F> Tab F d.f = ( t-1), (t-1) (t-2) at 1% level F is highly significant F = ( )**'Iio II
rejected
60
DESIGN OF EXPERIIIENT
In the last two cases we have to work out critical difference between any two treatment
(Rows or Columns) at 5% and 1% level which is given by
2EMS
t
Then we have to work out treatment means "row means and column means and if the
difference between any two treatments ( rows & columns) means is less than C.D. then they
do not differ significantly. If it is greater than C.D. it differs significantly.
Missing Plot Technique in LSD
Missing value(X)
C-column total
T-treatment tota
in which one
value is missing
G-Grand total
t - number of treatments
i
[G - R-c - (t - 1) T J2
( t-1 )3 ( t-2 )2
Total d.f will be reduced by one &hence error d.f will be reduced by one.
Standard error ( S.E.) of difference between any treatment with a missing value & another
treatment is given by
2 EMS
PItt
1
(t-1)(t-2)
Advantage of LSD
..
With two- way blocking or grouping LSD controls more of variation than CRD or RBD
LSD is an incomplete three way layout. Its advantage over complete three way layout is
that instead of t3 experimental units, we are using only t2 experimental units
I
)-
The statistical analysis can be carried out easily even though it is slightly complicated than
RBD
~
Even with missing values also, using missing plot technique we can do the analysis.
More than one factor can be investigated simultaneously & with fewer trials than more
complicated designs
\ Disadvantage of LSD
~ The fundamental assumption that there is no interaction between different factors may not
be true in general
~
Unlike RBD, in LSD, the number of treatments is restricted to the number of replications &
this limits its field of application. It is suitable for number of treatments between 5 and 8
and for more than 10, the design is seldom used since in that case the square becomes
too large, and does not remain homogeneous.
In case of missing values, when several units are missina. the statistical analvsis will be
more complicated_
61,
FACTORIAL EXPERIMENT
FACTORIAL EXPERIMENT
Factorial Experiment is only an experiment and not a design. It has to be carried out by
following anyone design CRD, RBD and LSD. Most frequently used design for factorial
experiment is RBD. So far we were considering single factors as treatments. In factorial
experiments treatments consist of combination of two or more factors each at two or more
levels. The combination of treatments is such that each level of every factor occurs together
with each level of every other factor. The number of treatments is the product of the number of
levels of all factors. If we have two factors each at two levels we say that it is a 2 x 2 factorial
experiment and if we have three factors each at two levels. Then we have 2 x 2 x 2 (i.e) 23
factorial experiment. If all the factors have equal levels then it is a symmetrical factorial
experiment. In general, if there are n factors each at m levels, then the experiment is mn
factorial experiment. If we have unequal levels, 'a' levels in factor A, 'b' levels in factor B, 'c'
levels in factor C, for the factors A,B,C and so on.
levels of that factor averaging over all levels of all other factors.
Simple effect of a factor at a particular level of other factors is the difference between '
the mean yield of the factor in the particular level and in the absence of that particular level.
Interaction between two factors is the variation of difference between mean yields
for different levels of one factor over different levels of other factor. It is the failure of the
differences in response to changes in levels of one factor to be the same at all levels of the
other factor.
Example:
22 factorial experiment, the power denotes number of factors and base denotes
number of levels of each factor.
Two factors are A and B each of two levels; ao, a1 and bo, b1 the treatment
combinations are four viz. aobo, a1bo, aOb1 &a1b1. Let us suppose that aobo, a1bo. aOb1 and a1b1.
Also denote the mean yields (of r replications say) of the respective treatment combinations
from a factorial experiment.
Simple effect:
The response to the factor A at the level of bo of B is (a1bo - aobo). The response Of A
at the level of b1 of B is (a1b1 - aOb1). These two are called simple effects of A
Main effect:
82
FACTORIAL EXPERIMENT
(a 1bo-aob())+(a 1b1-aob11
2
Similarly the average response to B averaged for both the levels of A is
(aob1 - aobo) + (a1b1 - a1bo) 12
Interaction effect:
Apart from the average response of A, we would like to know the differential response
to A at different levels of B if it exists. The measure of this is given by the difference in the
response to A at the levels b1 and bo of B.
i.e.
: It will be seen that this is also the measure of the differential response to B at the
different levels of A. This is termed the interaction between factors A and B and symbolized
AB.
Sum of squares for Main and Interaction Effects:
Between the four treatment combinations there are 3 d.f. These have been partitioned
into 3 meaningful single d.f.
Main effect A:- SS
B: SS
Interaction AB: SS
=
=
=
with d.t: =1
r/4 [(aob1+a1b1)-(aobo+a1bo)]2
with d.f =1
with d.f =1
All of the simple effects are equal to the main effect. Hence main effects are all that are
needed to describe the action of a factor.
(:S.
Hidden replication: Each main effect is estimated with the same precision as if the entire
trial had been devoted to that factor alone. It also provides a systematic set of factor
As the number of factor increases the size of the experiment becomes very large with 8
factors each at two levels there are 256 combinations in the factorial experiment. Not only
experiments with too many treatments are costly to run, but it is also difficult to find
sufficient uniform materials to form blocks to accommodate all the treatment combinations.
a. Large factorials may be difficult to interpret particularly when interactions are present.
FACTORIAL EXPERIMENT
Uses
1.
In experiments where the aim is to examine large number of factors and to determine
which are important and which are not important.
2.
3.
t1 = - 1OoC,
S1
= 1 month,
t 2 = -200 C
S2 = 2 month
Factorial set of treatment in this case will be 4 (i.e.) t1S1, t1S2, t2S1, t2S2. The experiment
could be run with an experimental design chosen to fit the conditions. Most often, we use RBD,
we need blocks with 4 experimental units to conduct experiment. The data will be analyzed in
such a way that main effects of temperature and time and time x temperature interaction will be
estimated and tested.
In practice, the factors at different levels will not behave uniformly. When two factors
interact, the response to changes in one factor is conditioned by the level of other factor.
Data Analysis
To illustrate the process in general, suppose we have two factors say Factor A at 'a'
levels and Factor B at 'b' levels and the experiment is done using a RBD with 'r' blocks, each
containing "ab' units. The model for observation in this experiment is given by
=
where
~k
Yijk -
is the response value of jth level of factor A, kth level of factor B in the ith block.
over all mean yield
Effect of ith block which is distributed normally with mean 0 and variance (Ji 2.
(pi ~N (0, (Ji 2))
added effect of jth level of factor A measured as deviation from Il
(Lj (Xj = 0)'
added effect of the kth level of factor '8' measured as a deviation from Il
Lk ~k =0
added effect of the combination of jth level of factor A with kth level of factor:Jl .
i.e.
Aj x Bk interaction effect.
64
FACTORIAL EXPERIMENT
Eijk
ijk -N ((0, 0i 2)
Factor A
1
2
3
T11
T21
T12
T22
3 .................... b
T13 .....................T1b
T23- .................... T2b
Ta1
B1
Ta2
B2
Total
Tjk
Aj
-. =
Y.jk
L}Tjk
k=1
= Yj.
I,Aj
Total
A1
A2
Aa
G
L:r Yijk
i=1
Bk=
I,Bk
k
LaTjk = Y.
j=1
=Y..
BIOC~ Total
1,
2 ........... r
R1,
R2 .......... Rr = G
G =
C.F
A1 + A2 + .......... Aa =
B1 + B2+ ............ Bb
= Rl + R2+.. ... Rr
= G2
r ab
Step3
Calculation of Sum of squares
'.
1. Total Sum of Squares(T.S.S) =Sum of Square of all the 'rab' response values - C.F.
=
2. Block Sum of Squares(B.S.S)
Li Li Lk Yijk2
C.F
=
ab
rb
65
C.F
FACTORIAL EXPERIMENT
4. Sum of squares dues to factor B (S.S due to B)
= B,2 + B22 + ......... +BJ
C.f
ra
5.Sum of squares due to interaction AB (S.S due to AB)
= 1/r LiLkTik2 -
as due to AB)
Formation of ANOV A
Sources of
variation
Blocks
d.f
S.S
M.S
r-1
B.SS
B.S.S/r-1
a-1
SS due to A
SSA/a-1
b-1
SS due to B
SSB/b-1
AB
(a-1)(b-1)
SS due to AB
SSAB/(a-1 )(b-1)
Error
Total
rab -1
Treatments
ESS/(r-1) (ab-1)
Step 3
Interpretation
Compare F Value for blocks, Factor A, Factor B & interactions AB wIth the
corresponding table value and declare t:1em as significant or not. If it is significant find the
critical difference.
C.d between any two level of factor A at 5% (1 %) =
C.d between any two level of factor Bat 5% (1%) =
d.f
a
C.d between any two blocks at 5% (1%)
x t at 5% or 1% error dJ'
.
When third factor is included in a factorial experiment the principle of design selection
and randomization remain unchanged. The number of treatment combination however
increases fairly rapidly and analysis of the resulting data becomes somewhat more
FACTORIAL EXPERIMENT
complicated. We now have to estimate and test, three main effects, three two factor (first
order) interactions and one three factor (second order) interactions.
Data analysis
Let us suppose that we have factor A at 'a' levels, factor B at 'b' levels and factor C at
'e' levels.
experimental units. Let us denote by Yijkl the response value of the jlh level of factor A and kth
level of factor Band Ith level of factor C in the ith block.
Yijkl
:::
)l
Table 1
Blocks
Total
Rl,R2 .................. Rr G
IiRi
" II Ij Ik II Yijkl
Ij IK II Yijkl
Table 2
Factor A x Factor B
Factor B
.............. b
Total
Factor A
1
2
T. 11 .
T.12.
........... T.lb.
Al
T. 21 .
T.22.
............ T.2b.
A2
Tal
B1
T. a2 .
........... T.ab.
Aa
B2
.............. Bb
Total
IiII Yijkl
Ik T.jk. ;
IjTjk. "
87
+E
ijkl
FACTORIAL EXPERIMENT
Table 3
Factor B x Factor C
Factor C
Factor B
..............c
Total
T.. 11
T.. 12
T..21
T..22
...........T.. 1C B1
............T.. 2c B2
B
Total
T.b1
Cl
T..b2
C2
........ T.. bc
........... Cc
Bb
Table 4
Factor C x Factor A
.............. a
Total
...........T.1.c
Cl
T.2.1
T.1.2
T.2.2
............ T.2.e
C2
Ta.l
Al
T.a.2
A2
........ T. a.c
.............. Aa
Cc
G
Factor A
Factor C
1
T.l1
C
Total
T.j.1 =
Lj Tj.1
Li Lk Yijkl CI =
If we do Table 2, Table 3, Table 4, two way analysis of variance we will get the sum of
square due to factor A, factor Bf factor C and the Error Sum of Squares which will correspond
to the one factor interaction (i.e) sum of square due to A x B interaction, Sum of Square due to
B x C interaction, Sum of Square due to C x A interaction respectively.
Table 5 : Factor A x Factor B x Factor C
Factor A
1
Y.m
Y.1ll
Y 121
Y.122
Factor C
...
...
...
...
...
...
Y.lbl
Y.lb2
...
Factor B
1
68
...
Total
Y.lle Y.ll.
Y.12c Y.12.
Y lbc
Y.lb.
FACTORIAL EXPERIMENT
Factor C
I~F
...
...
y 222 ....
...
...
Y.21c Y.21.
Y22c Y.22.
Y 2b2
...
...
Y2bc
Y a12
Y. a22
...
.. .
Y.a11
Ya21
...
...
Y.a1c Y. a1.
Ya2c Y. a2.
Yab1
Y. ab2
...
...
Y.abc
Factor A
Factor B
Y.211
Y221
Y2b1
I Y 212
Total
Y.2b.
:
! :
a.
C.F
Y.ab.
=
rabc
L,iL,i~:kL,1
=R1 Z + R22
Y2 ijkl - CF
......... + R f
.;
- CF .
"
abc
From table 5, we calculate Sum of Square due to A xB x C interaction as
1/r L,jLkLI Y. 1k1 2 - C.F - (SSA + SSB + SSC + SSAB + SSBC + SSCA)
Error sum of squares =TSS - (BSS + SSA + SSB + SSC + SSAB + SSBC + SSCA + SSABC)
Table 6
Sources
variation
of
Degrees
Freedom
of
Due to Factor A
a-1
Due to Factor B
b-1
Due to Factor C
c-1
AxB
(a-1) (b-1)
BxC
(b-1) (c-1)
AxC
(a-1) (c-1)
AxBxC
Error
(r-1) (abc-1)
Total
rabc -1
Sum of
squares
Mean
squares
--
69
FACTORIAL EXPERIMENT
~ ~
at 5% (1%) level
abc
~o levels of A = r 2EMS
~
at 5% (1%) level
rbc
at 5% (1%) level
Critical difference .between any two. levels of B ~ ~ 2EMS tab. t for error d.f
,
"
,.
rac
at 5% (1%) \e\JeI
~ x tab.
Interaction AB
Interaction BC
~ 2EMS
Interaction CA
\.
x.
ra
rc
~x
~ rb
at 5% (1%) level
at 5% (1%) level
Interaction ABC
at 5% (1%) level
2.
Logarithmic transformation
3.
70
FACTORIAL EXPERIMENT
This is done when the values are in proportion or in percentages. If the percentage
values are from 30 - 70, actually there is no need for angular transformation. If there is 0 or
100, corresponding to 0 we must take value 1/4n x 100 where 'n' is the total value in the
proportion or percentage. Corresponding to 100 it is (1-1 14n ) x 100.
Logarithmic transformation
If the values are exponential (e.g.) for microbial counts we have to take logarithm of all
the values and then do ANOVA. If 0 occurs add 1 to all the values and then take logarithm.
Square root transformation
If the values are according to Poisson distribution, we will take square root of all the
values and then do analysis of variance. If 0 occurs, we will add 1 to all the values or Y2 to all
the values and then take square root (i.e)
or
F.
or
{X + F
"J
71
ANALYSIS OF COVARIANCE
2.
3.
4.
In the first two cases, there is no need for ANACOVA. In the la~t two cases, we have to
do ANACOVA.
Uses of ANACOVA
1.
2.
3.
4.
Where
\
Yii is the
72
B- regression coefficient
Eii - random error component which is independently and identically distributed normally with
mean 0 and variance a}
The sum of squares of X, Y and sum of products XY are calculated as follows.
Let us denote Gx
Gy
Gxy
C.Fx
lit
N
C.Fy
C.Fxy
Sd
=
=
I
I
N
Gx.G y
liJ
n
Treatment sum of squares for Y (T ryy)
=
T1; + T2/ + ........ T~
n
Treatment sum of products XY (Trxy)
. C.Fy
Tyy - Tryy
Txy - Trxy
)C.Fxy
73
ANALYSIS OF COVARIANCE
Source of variation
Treatments
Error
Total
Of
t-1
N-t
N-1
XY
Trxx
Exx
Txx
Y
Tryy
Eyy
Tyy
Trxy
Exy .
Txy
~2
Exx
Table 2 Regression Analysis
Of
1
Source of variation
Regression
S.S
Exx
Residual
N+1
Eyy - ~2
Exx
Error
N-t
Eyy
M.S
~2
~2
Exx
>
-rEyy- ~2
Exx
N- t - 1
If F is not significant, then the regression co-efficient is not significant and there is no
need for ANACOVA. If F is significant or highly significant then the regression co-efficient is
significant or highly significant and there is possibility of influence of X on Y, ANACOVA is a
must.
Table 3 Adjusted Analysis
Sources
1
N-1
Error +
treatments
Error
SSx
3
SSxy
4
SSy
5
En + Trxx
Exy + Trxy
Eyy + Tryy
OF
2
N-t
En + Trxx
~(1d~
Eyy
Exy
En
ResSS
7
RegSS
6
Exx
En (M)
n+1
Of
t-1
S.S
L-M
Error
N-t-1
M.S
L-M = adj TrMS
t-1
_M._ = adj. EMS
N-t-1
>
If F is not significant then the Y's do not differ significantly. If F is significant or hi9ttt;
significant, find the critical difference between any two adjusted treatment means of Y.
Adj
Yi = Yi + b (xi - ~)
74
ANALYSIS OF COVARIANCE
x terror df at 5% (1%)
Exx
+ ti + bj + B(Xij - x) + tij
Yij
j.l
j.l
1i
bi
'
. tii
" =
,
,t'7,.
CFx = Gx2where N = bt
N
Bix = sum of x in ith block
- CFy
CFy=j
N
;-~-
CFxy = Gx Gy
N
Calculate Trxx, Tryy, Trxy, Txx, Tyy, Txy as in the previous case
Exx
75
ANALYSIS OF COVARIANCE
Of
(b-1 )
(t-1 )
(b-1) (t -1 )
bt-1
Byy
Tryy
Eyy
TJ"i
Bxv
Bxx
Trxx
TrXY
Exy
Exx
Txy
Txx
Table 2 Regression Analysis
OF
1
Ed
(b-1) (t-1)-1
Exx
Eyy - if
Exx
Error
N-t
Eyy
MS
SS
Source of
variation
to
Due
reqression
Residual
SSy
SPxy
SSx
fd /
1
Exx
Eyy - if
Exx
(b-1) (t-1)-1
>
If this F is not significant, then there is no need for ANACOVA. (i.e) There is no
significant influence of X on Y. If this F is significant or highly significant, we have to do
ANACOVA.
Table 3 Adjusted sum of Squares
SV
Treatment + Error
OF
b(t -1 )
(b-1 )(t-1)
Error
SSX
Trxx + Exx
SS
SSXY
Trxy + Exy
SSY
Tryy + Eyy
Eyy - if
_".
Exx
Table 4
Source
Treatment
Error
MS
SS
df
t-1
L-M
L-M
t -1
(b-1) (t -1) - 1
M
(b-1) (t -1) -1
2EMS adj
n
[1 + TrM~
t errordf at 5% (1%)
ExxJ
76
characteristics have to be defined and the numbers falling in each will have to be counted.
The quality standards are normally set by the makers of the product. The quality
consciousness amongst producers is always more, when there is competition from rival
producers. Also, when the consumers are quality conscious. The continuing patronage of
customers depends a great deal on maintenance of quality standards.
Statistical Quality Control is a planned collection and effective use of data for studying
causes of variations in quality either as between process, procedures, materials, machines etc.,
or over periods of time. This cause effect analysis is then fed back into the system with a view
to continuous action on the process of handling, manufacturing, packaging, transporting and
delivery at end use.
Different types of Quality Measures
The methods of Statistical Quality Control are used widely in production, storage,
packing and transportation. The tests being confined to only a part of the whole lot and at
times only at suitable interval.
otherwise involved in full inspection. Especially when the tests involve the destruction of the
product as in the case of the quality of eggs, testing the blood group etc.
77
i:s..
It has a healthy influency on the workers for they know that quality is being checked.
i:s..
If producers have strict quality control, the users may rely on it and may not resort
thorough check.
i:s..
The quality can be defended or maintained before any Governmental inquiry on the basis .
to a
The degree of check can be related with the precision required in each process and the
part performance thus economizing cost of inspection.
&
Good deal of data are available. The data on average level of performance and the
average range of variability can be used by the management for choice of plant and
machinery as well as technical staff. In other words, this data can help evaluation of the
men and equipment besides the process and product.
a Incidentally, the efficient working life of machinery can be determined and the discarding of
it is at the right time when it falls to produce goods of a desired specification despite
necessary maintenance and adjustments. ".
Basis of Statistical Quality Control
The basis of Statistical Quality Control is the degree of variability in the size or the
magnitude of a given characteristic of the product. Some amount of variability is bound to be
there, however, scientific and accurate the production process is. - The various causes of
variation may be classified into :
a)
b)
a)
td in' :-..
f
These causes have nothing to do with any latent or patent defect in the production
process. These arise in the process of taking out samples and drawing influences. It is difficult
to assign any specific cause for these variations.
Purpose of Statistical Quality Control
The main purpose of Statistical Quality Control is to separate these assignable causes
from the chance or random causes. Here we are more interested in the variations within the
sample and not between samples. The latter are those irrespective of the fact the lots are
78
pro(~ess.
down the limits of chance variatiOns (including those between sample variations).Any variations
beyond those limits must be dLJe to assignable causes within samples. If it is found that the
process is out of control the specific causes may be looked into through technical examination
of the production process in its various stages. Even if each process of the whole production
process is under cuntrol, statistical methods are necessary to detect the uniformity of their
standards.
Even if the segregatior1 of assignable and chance causes is not fully achieved, the
method of quality control will erlsure that the variations are not serious so as to damage the
goodw'm
o~
the product. 1hus tne wnole oasls aT statistical quaiity coritrOI IS tne begree aT
variability whether it is within the tolerable limits due to chance only and the product is
acceptable or not.
Types of Control
There are two broad I/Vays of statistically controlling the quality of the product viz.
process control and the product control.
i
./
Process Control
This is concerned with controlling the quality of goods manufactured in the process of
production. Process Control detects whether the production process is going on in the desired
fashion. In other words, it controls quality of the goods to be produced. It ensures that the
machineries are turning out the product of a requisite standard. The Statistical tool applied in
process control is the Control Chart. The primary objectives of process control are (a) to keep
the manufacturing process in control so that the proportion of defective units is not excessive
and (b) to determine whether a state of control exists.
Product Control
that the. tfJ"QCiuc.L cnntroJ. in. tha casa of inQ.uts_ shall. PJ)s!lf.P..
that the process is under control. Actually, process control is concerned mostly with
79
operations, machines and hands while product control is concerned with the quality of the
product turned out. Certainly a good process control will not require a strict product control.
Techniques
Product Control
(By Sampling Inspection)
I
Process Control
(By Control Charts)
I
Variables
I
XChart
R-Chart
Attributes
I
C-Chart np chart
Variables
Attributes
p-chart
Control Chart
&
&
the sample point falls within the upper and lower control limits there is nothing to worry as in
such a case, the variation between the samples is attributed to chance or unknown causes.
It is only when a sample point falls outside the control limits that it is considered to be a
danger signal indicating that assignable causes at bringing about variations. Thus there is no
wastage of time and money in an effort to find the reason for random variation but as soon as
assignable cause is apparent, necessary corrective action is taken. Generally, of all dots are
found between the upper and lower control limits it is assumed that the process is "in control'
and only chance causes are present. However, sometimes dots are found arranged in some
80
peculiar way.
successive dots may be located on the same side of the Central line or around control limit or
successive dots may follow a definite path leading towards the upper and lower control limit.
Such patterns of dots within control limit should also be considered as danger signals, which
may indicate a change in the production process. Thus control charts are not only watched for
pOints falling outside the control limits, they are also scrutinized for unusual patterns suggesting
trouble.
a)
b)
Parameter
Those characteristics pertaining to population are called parameters.
Statistics
Those pertaining to the Sample are called Statistics.
Let "8" be the breaking strength of the material. Let T be the corresponding statistic.
If the process is under control then the value of 8 must be the same for all sub-groups. In fact,
even if the process is under control, there will be small variations in the value of T from one
sub-group to another. These variations are allowable, as they occur due to chance. Let us
now fix a norm for the allowable variations so that if the process is under control, the value of T
will lie between the norm values. If T falls outside these norm values, we say that these are
specific systematic variation. Let J.lT be the mean value of the different groups and
variance. Then the limits are given by J.lT -30 T and
J.lT
0 T 2,
the
the statistic follows the normal distribution with mean J.lT and S.D 0
T.
of normal distribution 99.73% of the observation will lie within the 30 limits. In assigning these
limits the chance to go using is only 0.27% (i.e.) nearly 3 out of 100. The value
J.lT -
30T is
taken as the Lower Control Limit (LCL) and the value J.lT + 30T is taken as the Upper Control
Limit (UCL).
.81
~ TAT I~ TI(;AL
Quality Scale: This is a vertical scale. The scale is marked according to the quality
characteristic (either in variables or attributes) of each sample.
13.
Plotted Samples: The qualities of individual items of a sample are not shown as a control
chart. Only the quality of the entire sample represented by a single value (a statistic) is
plotted. The single value plotted on a chart is in the form of a dot or cross or a small circle.
(:s..
Sample Numbers: The samples plotted on a Control Chart are numbered individually and
consecutively on a horizontal line. The line is usually placed at the bottom of the chart.
The samples are also referred to as sub-groups in Statistical Quality Control.Generally 25
sub-groups are used in constructing
R Control
Chart.
The success of control chart technique depends largely upon the efficient grouping of
items into samples, such that variation in quality among items within the same sample is small,
but variation between one sample and another is as large as possible.
Such a sample is
known as "rational sub-group'. The obvious way for the selection of sub-groups is the order of
production. Here, each sub-group will be of products during a short period so that there will not
be any remarkable change within that period. The use of each sub-group is that it will reveal
the causes for variation. There may also be causes which will not be revealed by these subgroups.
13.
Horizontal Lines: The central line represents the average quality of the samples plotted
on the chart. The line above the Central line is the Upper Control Limit, which is obtained
by adding 30' to the mean i.e.mean+3 (S.D). The line below the central line is the Lower
Control Limit which is Mean -3 (S.D).
,
\
Variables are those quality characteristics of a product, which are measurable and can be
expressed in specific units of measurement such as diameter of radio knobs, which can be
measured and expressed in centimeters, tensile strength of cement, which can be expressed in
specific measures per square inch of space etc. Attributes on the other hand are those product
characteristics which are not available to measurement. Such characterization can only be
identified by their presence or absence from the product. For example, we may say that plastic
is cracked or not cracked whether bottles that have been manufactured contained holes or not.
Attributes may be judged either by the proportion of units that are defective or by the number of
defects per unit. Thus the data resulting from inspection of a quality characteristic may take
anyone of the following forms
82
A record of ttje actual measurements of the quality c~acteristics for individual articles or
specimens.
&
A record of number of articles or specimens inspected and if the number found defective.
&
A record of the number of defects that are found in a sample. The number of defects per
sample may be very large compared to the average number of defects per sample.
For purpose of control, data of the first form (i) listed above may be summarized by
taking two statistical measures, the average (X) and the standard deviation (cr) or the average
(X) and the range R. Data of the second form (ii) can be summarized in terms of fraction
defective (p) and data of type (iii) can be summarized in terms of number of defects per unit (c)
Mostly variables which are reasonable are of continuous type, or of type whose
frequency distribution follow normal laws. For purpose of control data, two types of control
charts are used. One for the mean of measurement (X - Chart) and another for the range of
the measurement (R Chart).
The data of type (iii) are discrete such as number of defective glasses in a lot, number
of surface defects on a floor etc. In such cases, the number of defectives on an item may be
nil, one, two or more. In such a case, the distribution explaining the number of items according
to number of defects on it, when the process is in control will be a Poisson distribution. Under
such circumstances, control charts for average number of defects per item, C-chart is used.
For the data of type(ii) Binomial distribution explains the chance variations in the
proportion of defective provided the sample selected from the lot is relatively very small. In
such cases, control chart for proportion of defective (p-chart) is used.
X - Chart
The chart is constructed on the basis of a series of samples drawn frequently during a
production process which are called rational sub-groups. Usually smaller sub-groups of size 4
or 5 units are preferred and at least 25 sub-groups are used in the evaluation of control limits.
Construction of
&
X - Chart
<
I
!
UCL=
X+
3.Q
factor and
X+
d'
and the values of A2 are given in the table for n=2 to 15.
iii)
UCL X =
LCL X =
X-A1
Then draw control chart. If no point falls outside the control limits, consider them as a
homogeneous group. If a pOint falls beyond the control limits, it is regarded that an assignable
cause has thrown the process out of control. The next step is to remove all sample results
which are outside the control limits and revise the control limits for the remaining samples.
Compare all the remaining plotted points against the revised control limits, until all of the
sample means are within the new control limits, the procedure of computation may be
repeated.
R-Chart
It is
uSGct to
process.
Though Standard Deviation is the best measure of variation, Range is commonly used
in Statistical Quality Control to study the pattern of variations in quality. This is due to the fact
that for small samples of size, say 15 or less, range provides a good estimate of a.
R Chart (or a chart) is the companion chart to X - Chart and both are usually required
for adequate analysis of the production process under study. It is generally presented along
with X - Chart .
';c
The required values for constructing R Chart are
a)
b)
c)
Note:
R Chart is used only when the sample size is small. If the size is >12, it is better to
prefer a-chart.
84
C-Chart
The C Chart is designed to control the number of defects per unit. Control Chart for C
is used in situations wherein the opportunity for defects is large while the actual occurrence
tends to be small, such situations are described by the Poisson distribution. This may occur in
the case of the number of imperfections in a piece of cloth, the number of air bubbles in a
piece of glass, the number of blemishes in a sheet of paper, etc. Let C stand for the number of
defects counted in one unit of cloth (paper, glass, roll of wire) and C for the mean of the
defects counted in several (usually 25 or more) such units of cloth. The center line of the
Control Chart for C is C and the 3 - sigma Control limits.
Note:
The sample size must be uniform while using C-Chart. If there is variation in sample
size and if it is large, then we have to go in for p-chart.
p-Chart
Control chart for p (fraction defective)
'(3..
Where the size of the sample variables from sample to sample, the p-chart permits a more
straight forward presentation.
The only disadvantage is here we have to calculate c/n.
-
jP
(1-
Pi
n
It is always preferred to express results in terms of percent defective rather than
lction defective".
np Chart
A np chart shows the actual number of defects found in each sample. If the number of
items inspected on each occasion is the same, the plotting of the actual number of defective
85
may be more convenient and meaningful than the fraction defective. The construction and
interpretation is similar to 'p , chart.
Then the control limits are n p 3a n p
anp
In
p(1-
P)
The Control charts described above cannot be applied to all types of problems. They
are useful only for the regulation of the manufacturing problem. Another important field of
quantity control is acceptance sampling. Inspection for acceptance purposes is carried out at
many stages in manufacturing. For example, there may be inspection of incoming materials
and parts, process inspection at various pOints in the manufacturing operations, final inspection
by a manufacturer of his own product and ultimately inspection of the finished product by one
or more purchasers. Much of the acceptance inspection is carried out on sampling basis. The
use of sampling inspection by a purchaser to decide whether or not to accept the product is
known as acceptance samplings or sampling inspection. A sample of the product is inspected
and if the number of defective items is more than a stated number known as the acceptance
number, the product is rejected. The standards in acceptance inspection are set according to
what is required of the product, rather than by the inherent capabilities of the process, as in the
process control. The purpose of acceptance sampling is therefore whether to accept or reject a
product. It does not attempt to control quality during manufacturing process, so as to the
techniques described earlier. Sampling inspection is referred to a product control, because it is
designed to provide decision procedures under which a lot will be accepted or rejected.
..1
To determine whether a batch of items called an inspection, lot or simply a lot, that has
been delivered by a supplier is of acceptable quality.
'&
To make sure that a lot that is complete and ready for shipments is of adequate quality ...
'&
To determine whether the partly completed material is of sufficiently high quality to justify
further processing.
'&
Because of the laborious work involved in 100 % inspection, a good sampling plan may
actually give better quality assurance than 100% inspection.
a. Where quality can be tested only by destroying items, as in determining the strength of
glass containers, 100% inspection is out of question and sampling must be used. But,
there are situations where 100% inspection is very essential. For example, in testing rifles
used by soldiers, we therefore must test each and every rifle.
.i
..
types of inspection
Attribute sampling plans are thus more popularly used than the variables.
Since under a sampling inspection plan a decision made is to whether to accept a lot
or reject a lot on the basis of a sample, there is a possibility of (1) rejecting a lot is
unsatisfactory when it is of acceptable quality and (2) accepting a lot is satisfactory when it is
in fact below the quality level. Hence in any acceptance plan, the producers and consumers,
the sellers and the buyers are exposed to some risks, which are called producer's and
consumer's risks. The producer's risk is the risk a producer takes in rejecting a lot by a
sampling plan even though it confirms to requirements. This is equivalent to the concept of
type I error or the probability of rejecting a hypothesis when it is in fact true. The consumer's
risk is the risk that a lot of certain quality will be accepted by a sampling plan. When in fact it is
below the expected or required standard. It is equivalent to type II error which is the probability
of accepting a hypothesis when the hypothesis is false.
AQL & LTPD
87
Construction of an DC curve.
An OC curve can be determined by using either the Poisson distribution or the
Thomdike chart. The Poisson distribution can be used in all situations where p is less than .10
(or if the pn is less than 5) and the lot size is at least 10 times the size of the sample.
In a situation in which these conditions use not yet, the theoretically correct approach
is to use the binomial or the hyper geometric distribution.
situations, the Poisson distribution can be used without serious loss of accuracy.
2)
3)
1)
of only one sample, the acceptance plan is described as a single sampling plan. This is the
simplest type of sampling plan.
specified namely (a) Number of items N in the lot from which the sample is to be drawn
(b)
Number of articles n in the random sample drawn from the lot which is to be inspected. (c) The
acceptance number c. This acceptance number is the maximum allowable number of defective
articles in the sample. More than, this will cause rejection of the lot.
88
Thus a sampling plan may be specified in this way (a) N=200 (b) n=20 (c) c=1 which
was that a random sampling of 20 items from a lot containing 200 items if the sample contain
more than one defective item, reject the lot. Otherwise accept the lot.
specified.
1)
2)
Cl
= Acceptance number for the first sample ( the maximum number of defectivegtfrat
will permit the acceptance of the lot on the basis of the first sample).
3)
4)
5)
C2
(maximum number of defectives that will permit the acceptance of the lot on the basis of ~
two samples)
Thus a doubling sampling plan may be :
N=500, nl=20,cl=1, n2=60,c2=4
Which will be interpreted as,
1)
2)
Accept the lot on the basis of the first sample if it contains 1 defective.
3)
Reject the lot on the basis of first sample if the sample contain mort than one
defectives.
4)
5)
Accept the lot on the basis of combined sample of 80 if the combined sample contains
4 Or less defectives.
6)
Reject the lot on the basis of combined sample if the combine rl sample contains more
than 4 defectives.
For the producer's side, it will be unfair to reject a lot on the basiS of a single sample.
89
The expected fraction defective remaining in the lot after the application of the
sampling plan is called the average outgoing quality (AOO). This is a function of p', the actual
fraction defective in the lot. The maximum value of the average outgoing quality (AOO), the
maximum being taken with respect to p, is known as the Average Outgoing Ouality Lot (AOOL).
Average Sample Number (ASN)
The expected value of the sample size require for using to a decision (i.e) for
acceptance or rejection under the sampling inspection plan of a lot is called average sample
number (ASN). This is naturally a function of p, the actual fraction defective of the lot. The
curve obtained by plotting ASN against 'p' is called ASN curve. Obviously, other factors
remaining the same the lower, the ASN curve, the better is the sampling inspection plan.
90
't'Table
d.f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
91
1
2
0.05
3.84
5.99
:J
782
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
9.49
11.07
12.59
14.07
15.51
16.92
18.31
19.68
21.03
22.36
23.68
25.00
26.30
27.59
28.87
30.14
31.41
32.67
33.92
35.17
36.42
37.65
38.89
40.11
d.f
28
41.:J4,
29
30
42.56
43.77
92
0.01
6.64
9.21
! !.:J4,
13.28
15.09
16.81
18.48
20.09
21.67
23.21
24.72
26.22
27.69
29.14
30.58
32.00
33.41
34.80
36.19
37.57
38.93
40.29
41.64
42.98
44.31
45.64
46.96
48.2a
49.59
50.89
ABCDE
BAECD
CDAEB
DEBAC
BECDBA
ABCDE
BADEC
CEBAD
DCEBA
EDACB
ABCDE
BADEC
CDEAB
DEBCA
ECABD
ABCDE
BAECD
CEDAB
DCI3EA
EDABC
ABCDE
BCDEA
CEBAD
DAEBC
EDACB
1,2
3,4
5,6
7,8
9,10
ABCDE
BCDEA
CEABD
DAECB
EDBAC
ABCDE
BCEAD
COBEA
DEACB
EADBC
ABCDE
BCEAD
CADEB
DEI3CA
EDABC
ABCDE
BCDEA
CAEBD
DEACB
EDBAC
ABCDE
BCAED
CEDAB
DAEBC
EDBCA
11,12
13,14
15,16
17,18
19,20
ABCDE
BCAED
CDEBA
DEBAC
BADCB
ABCDE
BDECA
CABED
DEABC
ECDAB
ABCDE
BDAEC
CEDBA
DCeAB
EABCD
ABCDE
BDECA
CADEB
DEABC
ECBAD
ABCDE
BDAEC
CEDBA
DAECB
ECBAD
21,22
23,24
25,26
27,28
29,30
ABCDE
BDEAC
CEDBA
DCAEB
EABCD
ABCDE
BDAEC
CEBAD
DAECB
ECDBA
ABCDE
BDeAC
CEABD
DCSEA
EADCB
ABCDE
BEDAC
CABED
DCEBA
EDACB
ABCDE
BEDAC
CAEBD
DCAEB
EDBCA
."
31,32
33,34
35,36
37,38
39,40
ABCDE
BEACD
CDEAB
DABEC
ECDAB
ABCDE
BEDCA
CAEBD
DCAEB
EDBAC
ABCDE
BEDCA
CDEAB
DASEC
ECABD
ABCDE
BEACD
CDBEA
DCEAB
EADBC
ABCDE
BEDAC
CDAEB
DCEBA
EABCD
41,42
43,44
45,46
47,48
49,50
ABCDE
BCEAD
CEDBA
DABEC
EDACB
ABCDE
BCDEA
CDEAB
DEABC
EASCD
ABCDE
BDECA
CEBAD
DCAEB
EADBC
51
52
53
ABCDE
BDAEC
CAEBD
DEBCA
ECDAB
ABCDE
BEDAC
CDBEA
DAECB
ECABD
ABCDE
BEACD
CADEB
DCEBA
EDBAC
54
55
56
93
0.997
0.950
0.878
0.811
1.000
0.990
0.959
0.917
50
52
54
0.273
0.268
0.263
0.354
0.348
0.341
56
0.259
0.336
0.755
0.707
0.777
0.875
0.834
0.798
0.765
0.735
0.708
0.684
58
60
62
64
0.245
0.250
0.246
0.242
66
68
70
72
74
76
78
80
82
0.239
0.235
0.232
0.229
0.260
0.223
0.220
0.217
0.215
0.330
0.325
0.320
0.315
0.310
0.306
0.302
0.298
0.294
0.290
0.286
0.283
0.280
15
16
17
0.632
0.602
0.576
0.553
0.532
0.514
0.497
0.482
0.468
0.456
18
0.444
0.661
0.641
0.623
0.606
0.590
0.575
0.561
84
0.212
0.276
19
20
21
22
0.433
0.423
0.413
0.404
0.549
0.537
0.526
0.515
86
88
90
92
0.210
0.207
0.205
0.203
0.273
0.270
0.267
0.264
23
0.396
0.388
0.381
0.374
0.505
0.496
0.487
0.479
94
24
25
26
0.201
0.199
0.197
0.262
0.259
27
28
31
32
33
34
35
36
37
0.367
0.361
0.344
0.339
0.334
0.329
0.325
0.320
0.471
0.463
0.442
0.436
0.430
0.424
0.418
0.413
96
98
100
105
110
115
130
135
140
145
150
0.316
0.408
38
39
40
0.312
0.308
0.304
41
0.301
42
0.297
0.294
0.291
0.288
0.185
0.282
0.279
0.276
8
9
10
11
12
13
14
43
44
45
46
47
48
49
0.195
0.256
0.254
0.190
0.186
0.182
0.171
0.168
0.248
0.242
0.237
0.223
0.219
0.165
0.162
0.159
0.215
0.210
0.208
160
0.154
0.202
0.403
0.398
0.393
170
180
190
0.150
0.145
0.142
0.196
0.190
0.185
0.389
0.384
200
0.138
0.181
250
0.380
0.376
0.372
0.368
0.365
0.361
0.358
300
350
400
450
500
600
700
0.124
0.113
0.162
0.148
0.128
0.137
94
0.105
0.098
0.092
0.088
0.080
0.074
0.1?l
0.115
0.105
0.097
5% 'F'Value
De rees of freedom, n1
1
2
3
161.40 199.50 215.70
18.51
19.00 19.16
10.13
9.55
9.28
7.71
6.94
6.59
6.61
5.79
5.41
5.99
5.14
4.76
5.59
4.74
4.35
5.32
4.46
4.07
5.12
4.26
3.86
4.96
4.10
3.71
4.84
3.98 3.590
4.75
3.88
3.49
4.67
3.80
3.41
4.60
3.74
3.34
4.54
3.68
3.29
4.49
3.24
3.63
4.45 3.590
3.20
4.41
3.55
3.16
4.38
3.52
3.13
4.35
3.10
3.49
4.32
3.47
3.07
4.30
3.44
3.05
4.28
3.42
3.03
3.01
4.26
3.40
4.24
3.38
2.99
2.98
4.22
3.37
2.96
4.21
3.35
4.20
3.34
2.95
4.18
3.33
2.93
4.17
3.32
2.92
4.03
3.23
2.84
4.00
3.16
2.76
3.92
3.07
2.68
3.84
2.99
2.60
95
00
8
12
24
238.90 243.90 249.00 254.30
19.37 19.41
19.45 19.50
8.84
8.74
8.64
8.53
6.04
5.91
5.77
5.63
4.82
4.68
4.53
4.36
4.15
4.00
3.84
3.67
3.73
3.57
3.41
3.23
3.44
3.12
2.93
3.28
3.23
2.71
3.07
2.90
2.74
3.07
2.91
2.54
2.95
2.79
2.61
2.40
2.85
2.69
2.50
2.30
2.77
2.60
2.42
2.21
2.70
2.53
2.35
2.13
2.07
2.64
2.48
2.29
2.59
2.42
2.24
2.01
2.19
2.55
2.38
1.96
2.51
2.34
2.15
1.92
2.48
2.31
2.11
1.88
2.45
2.28
2.08
1.84
2.42
2.25
2.05
1.81
2.40
2.23
2.03
1.78
2.38
2.20
2.00
1.76
2.36
2.18
1.98
1.73
1.71
2.34
2.16
1.96
2.32
2.15
1.95
1.69
2.30
2.13
1.93
1.67
2.29
2.12
1.91
1.65
2.28
2.10
1.90
1.64
2.27
2.09
1.62
1.89
2.18
1.79
1.51
2.00
2.10
1.92
1.70
1.39
2.02
1.83
1.61
1.25
194
1.75
1.52
1.00
1% 'F' Value
1
4052
98.49
34.12
21.20
16.26
13.74
12.25
11.26
10.56
10.04
9.65
9.33
9.07
8.86
8.68
8.53
8.40
8.28
8.18
8.10
8.02
7.94
7.88
7.82
7.77
7.72
7.68
7.64
7.60
7.56
7.31
7.08
6.85
6.64
2
4999
9901
30.81
18.00
13.27
10.92
9.55
8.65
8.02
7.56
7.20
6.93
6.70
6.51
6.36
6.23
6.11
6.01
5.93
5.85
5.78
5.72
5.66
5.61
5.57
5.53
5.49
5.47
5.42
5.39
5.18
4.98
4.79
4.60
3
5403
99.17
29.46
16.69
12.06
9.78
8.45
7.59
6.99
6.55
6.22
5.95
5.74
5.56
5.42
5.29
5.18
5.09
5.01
4.94
4.87
4.82
4.76
4.72
4.68
4.64
4.60
4.57
4.54
4.51
4.31
4.13
3.95
3.78
De rees of freedom, n1
4
5
6
5625
5764
5859
99.25 99.30 99.33
28.71 28.24 27.91
15.98 15.52 15.21
11.39 10.97 10.67
9.15
8.47
8.75
7.85
7.46
7.19
7.01
6.63
6.37
6.42
6.06
5.80
5.64
5.99
5.39
5.67
5.32
5.07
5.41
5.06
4.82
5.20
4.86
4.62
4.69
4.46
5.03
4.32
4.56
4.89
4.44
4.20
4.77
4.67
4.34
4.10
4.25
4.01
4.58
4.50
4.17
3.94
4.10
4.43
3.87
4.37
4.04
3.81
4.31
3.76
3.99
3.71
4.26
3.94
4.22
3.90
3.67
4.18
3.86
6.63
4.14
3.82
3.59
4.11
3.78
3.56
3.75
4.07
3.53
3.73
3.50
4.04
3.47
4.02
3.70
3.51
3.29
3.83
3.12
3.34
3.65
3.17
2.96
3.48
3.32
3.02
2.80
8
5981
99.36
27.49
14.80
10.27
8.10
6.84
6.03
5.47
5.06
4.74
4.50
4.30
4.14
4.00
3.89
3.79
3.71
3.63
3.56
3.51
3.45
3.41
3.36
3.32
3.29
3.26
3.23
3.20
3.17
2.99
2.82
2.66
2.51
12
6106
99.42
27.05
14.37
9.89
7.72
6.47
5.67
5.11
4.71
5.40
4.16
3.96
3.80
3.67
3.55
3.45
3.37
3.30
3.23
3.17
3.12
3.07
3.03
2.99
2.96
2.93
2.90
2.87
2.84
2.66
2.50
2.34
2.18
24
6234
99.46
26.60
13.93
9.47
7.31
6.07
5.28
4.73
4.33
4.02
3.78
3.59
3.43
3.29
3.18
3.08
3.00
2.92
2.86
2.80
2.75
2.70
2.66
2.62
2.58
2.55
2.52
2.49
2.47
2.29
2.12
1.95
1.79
00
6366
99.50
26.12
13.46
9.02
6.88
5.65
4.86
4.31
3.91
3.60
3.36
3.16
3.00
2.87
2.75
2.65
2.57
2.49
2.42
2.36
2.31
2.26
2.21
2.17
2.13
2.10
2.06
2.03
2.01
1.80
1.60
1.38
1.00
1%
5%
1
2
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
0
0
0
0
1
1
1
2
2
2
3
3
3
4
4
4
5
5
6
6
6
7
7
7
8
8
9
9
9
10
10
11
11
11
12
12
13
13
0
0
1
1
1
1
2
2
2
3
3
4
4
4
5
5
5
6
6
7
7
7
8
8
9
9
9
10
10
11
11
12
12
12
13
13
14
14
15
15
81
82
83
84
85
86
87
88
89
90
97
1%
13
14
14
15
15
15
16
16
17
17
17
18
18
19
19
20
20
20
21
21
22
22
22
23
23
24
24
25
25
25
26
26
27
27
28
28
28
29
29
30
30
31
31
31
32
5%
15
16
16
17
17
18
18
18
19
19
20
20
21
21
21
22
22
23
23
24
24
25
25
25
26
26
27
27
28
28
28
29
29
30
30
31
31
32
32
32
33
33
34
34
35
TABLE 8 : CRITICAL VALUES OF' T' IN THE WILCOXON SIGNED RANK TEST
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Level of Significance
for Two-Tailed Test
0.05
0.01
0
2
0
2
3
5
7
10
13
16
20
23
28
32
38
43
49
55
61
68
6
8
11
14
17
21
25
30
35
40
46
52
59
66
73
81
89
98
n1
n2
'--'---
4 5 6
9 10 11 12 13 14 15 16 17 18 19 20
2
2
3
3
3
2
3
3
3
2
3
3
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
2
2
2
3
3
3
3
3
3
3
3
4
4
4
4
4
2
2
3
3
3
3
3
4
4
4
4
4
4
4
5
5
5
2
2
3
3
3
3
4
4
4
4
5
5
5
5
5
5
6
6
4
4
4
5
5
5
5
5
5
5
5
6
6
6
6
6
6
6
6
4
4
5
5
5
6
6
7
7
7
7
8
8
8
7
7
7
7
2
3
3
4
5
5
5
6
6
7
7
7
7
8
8
8
8
9
2
3
4
2
2
3
4
5
5
5
6
6
7
7
7
8
8
8
9
9
9
9
7
7
7
8
8
8
2
2
3
4
5
5
6
6
7
7
8
8
2
2
3
5
5
5
6
5
6
7
7
8
8
7
7
8
8
7
8
8
5
6
7
7
2
3
4
5
5
6
7
8
8
9
9
8
9
9
9
9
9
9
9
10
10
11
11
11
12
12
13
10
10
11
11
12
12
13
13
9
9
9
2
3
3
2
3
4
2
3
4
10
10 10
9
9 10 10 11
9 10 10 11
10 10 11 11
10 10 11 12
..
9
9
10
10
11
11
11
12
12
2
3
4
5
6
6
7
8
2
3
4
5
6
6
7
8
8
9
9
9
10
10
11
11
12
12
13
13
13
10
10
11
12
12
13
13
13
14
Notes: The values of r given In Tables 9 and 10 are various critical values of r associated with selected
values of n1 and n2. For the one-sample run test, any value of r that is equal to or less than the value shown in
Table 9 or equal to or greater than the value shown in Table ~ 0 is significant at the 5 percent level. For the twosample run test, any value of r that is equal to or less than the vqlue shown in Table 9 is significant at the 5
percent level.
99
~ 2
n2
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
9
9 10
9 10
11
11
9
10
11
12
12
13
13
13
13
11
12
13
13
14
14
14
14
15
15
15
11
12
13
14
14
15
15
16
16
16
16
17
17
17
18
19
20
17
17
17
10 11
12 13 14 15 16 17 18 19 20
13
14
14
15
16
16
16
17
17
18
18
18
18
18
18
13
14
15
16
16
17
17
18
18
18
19
19
19
20
20
13
14 15 15 15
16 16 16 16
16 17 17 18
17 18 18 18
18 19 19 19
19 19 20 20
19 20 20 21
20 20 21 22
20 21 22 22
21 21 22 23
21 22 23 23
21 22 23 24
22 23 23 24
22 23 24 25
13
14
15
16
17
17
18
19
19
19
20
20
20
21
21
100
17
17
18
19
20
21
22
23
23
24
25
17
18
18
19
19
20
20
21
21
22
21
23
22
24
23
25
23
25
24
25 25 26
25 26 26
25 26 27
17
18
20
21
22
23
23
24
25
26
26
27
27
17
18
20
21
22
23
24
25
25
26
27
27
28