You are on page 1of 107

APPLIED STATISTICS

w. Jebarani
M. Thirunavukkarasu '

DEPARTMENT OF ANIMAL HUSBANDRY STATISTtCI ANO


COMPUTER APPLICATIONS
'
MADRAS VETERINARY COLLEGE
CHENNAI - 600 007

2005

ACKNOWLEDGEMENT
The autho1"s thank the Dean, Mad1"as Vete1"ina1"Y
College, Chennal - 600
t-rorn

007

t01" p-rovidlng Financial assistance

th0 I('I\P n0v0Io prn(,nl (',,'<HI!

I,)

il1"l{l<l ""I II" .

manual and the Uean, f=-aculty of BasIc SCiences Fo1" his


constant
01" V

enCOU1"agement.
Manl,

\-lusband1"Y

The

P1"otesso1" and

Statistics

and

suggestions
Head,
Compute1"

Dept

Animal

Applications,

V 8te-rlna1"Y College and Resea-rch Institute, Namakkal to


Imp1"Ove

the

contents

of

the

manual

a1"e

g1"atefully

acknowledged.

Authors

CONTENTS
, TOPIC
Theory of Sampling

PAGE NO.
01 - 16

l?- 34 ;,

Non-Parametric Tests (or) Distrihution Free Methods

35 - 37

Partial and Multiple Correlation

38 - 42

Path Analysis

43 - 46

Design of Experiment

47 - 61

<"'1

7"

.,

Factorial Experiment

62 - 71

Analysis of Covariance (ANACOVA)

72 -76

Statistical Quality Control

77 - 90

List of Tables

91 -100

THEORY OF SAMPLING

THEORY OF SAMPLING
The data In needs and resources may be classified into the following two groups:

1.

Survey Data: This type of data already exists which can be collected and recorded

by observation or enquiry.

2.

Experimental Data: This type of data can only be obtained with the help of well

designed statistical experiments.


The purpose of statistical surveys is to obtain information about population.

By

population (or universe) it is meant that the aggregate of all units of given type under
consideration at a particular point of time. The information needed about the population is
normally the totals or averages of the values of various characteristics.
Information on a population may be collected in two ways. One is called complete
enumeration or census and the other is called sample enumeration or sample survey.

Census: Every unit in the population is being enumerated.


Sample Survey: Enumeration is limited to only a part or a sample selected fro"
the population.

Parameter: Statistical meaSures pertaining to the population e.g. mean or s.d of the
Population and the like measures

Statistic: Statistical.measures pertaining to random sample. E.g. mean or s.e of the


random sample and the like measures

Need for sampling


In many situations census methods are not possible. A laboratory technician
examines a few drops of blood and draws conclusion on blood constituents of the whole body.
Examinations are conducted with selected number of questions and students are evaluated.
The samples are used to know the characteristics of the population. Sometimes due to want of
time, cost, labour etc., we may go in for sampling method. If the population is infinite then
census is not possible. If an article is to be destroyed in testing the character, census method
can't be carried out.

PrinCiples of Sample survey: The theory of sampling is based on the following important
principles:

1.

Principle of Statistical regularity : This principle has its origin in the

mathematical theory of probability. According to King 'the law of statistical regularly lays down
that a mQdemte~'J

~a~ge r,lJmbe~ Q~

items. C~,Qffir:\ at ~ar:\dQm ~mm a I,a~ge g~QIJ.? me al,mast SIJ~e

on the average to possess the characteristics of the large group." This prinCiple stresses the

THEORY OF SAMPLING

desirability and importance of selecting the sample at random so that each and every unit in the
population has an equal chance of being selected in the sample.
An immediate derivation from the principle of statistical regularity is the Principle of
Inertia of large Numbers which states that, "other things being equal, as the sample size
increases, the result tend to be more reliable and accurate." This is because in dealing with
large numbers the variations in the component parts tend to balance each other and
consequently the variation in the aggregate result is likely to be insignificant. For example, in a
coin tossing experiment, the results will be approximately 50% heads and 50% tails provided
the experiment is performed in a fairly large number of times.
2.

Principle of Validity: By the validity of a sample design we mean that it should

enable us to obtain valid tests and estimates about the parameters of the population. The
samples obtained by the technique of probability sampling satisfy this principle.
3.

Principle of Optimization: This principle impresses upon obtaining optimum

results in terms of efficiency and cost of the design with the resources at our disposal. The
reciprocal of sampling variance of and estimate provides a measure of its efficiency while a
measure of the cost of the design is provided by the total expenses incurred in terms of money
and man hour. The principle of optimization consists in
i" .,'

i.

achieving a given level of efficiency at minimum cost and .

ii.

obtaining maximum possible efficiency with given level of cost.

Sampling and Non-sampling Errors: The errors involved in the collection,

processing and analysis of a data may be broadly classified under the following two heads:
1. Sampling Errors and 2.Non-sampling Errors
1. Sampling Errors: Sampling errors have their origin in sampling and arise due to the fact
that only a part of the population (i.e. sample) has been used to estimate population
parameters and draw inferences about the population. As such the sampling errors are absent
in a complete enumeration survey.
Sampling biases are primarily due to the following reasons:

i.

Faulty selection of the sample: Some of the bias is introduced by the

use of defective sampling technique for the selection of a sample, e.g., purposive or judgment
sampling in which the investigator deliberately selects a representative sample to obtain certain
results. This bias can be over comed by strictly adhering to a simple random sample or by
selecting a sample at random subject to restrictions which while improving the accuracy are of
such nature that they do not introduce bias in the results.
ii.

Substitution : If difficulties arise in enumerating a particular sampling unit

included in the random sample, the investigators usually substitute a convenient member of the

THEORY OF SAMPLING

population. This obviously leads to some bias since the characteristics possessed by the
substituted unit will usually be different from those possessed by the unit originally included in
the sample.
iii.

Faculty demarcation of sampling units : Bias due to defective

demarcation of sampling units is particularly significant in area surveys such as agricultural


experiments in the field or crop cutting survey etc., In such surveys, while dealing with border
line cases, it depends more or less on the discretion of the investigator whether to include them
in the sample or not:
iv.

Constant error due to improper choice of the statistics for estimating the

population parameters: for example, if Xl,

X2 ... , Xn

is a sample of independent observations then

the sample variance


S2

=I

(Xi -

xV / n as an estimate of the population variance cr 2 is

i=1

biased whereas ttle statistic


( n -1)

i =1

Remark: Increase in the sample size (i.e. the number of units in the sample) usually
results in the decrease in sampling error. In fact, in many situations this decrease in sampling
error is inversely proportional to the square root of the sample size as illustrated in the diagram
given below.
.._

g
w
OJ
C

0.

<U

(f)

Sam ole size


2.

Non-sampling Errors: As distinct from sampling errors which are due to the

inductive process of inferring about the population on the basis of a sample, the non-sampling
errors primarily arise at the stages of observation, ascertainment and processing of the data
and are thus present in both the complete enumeration survey and the sample survey. Thus,
the data obtained in a complete census, although free from sampling errors, would still be
subject to non-sampling errors whereas data obtained in a sample survey would be subject to
both sampling and non-sampling errors.
Non-sampling errors can occur at every stage of the planning or execution of census or
sample survey. The preparation of an exhaustive list of all the sources of non-sampling errors
is a very difficult task. However, careful examination of the major phases of a survey (complete

THEORY OF SAMPLING

or sample) indicates that some of the

more important non-.'.,g

errors arise from the

following factors.
Family planning or definitions: The planning of a survey consists in explicitly

i.

stating the objectives of the survey. These objectives are then translated into i. A set of
definitions of the characteristics for which data are to be collected, and ii. Into a set of
specifications for collection, processing and publishing. Here the non-sampling errors can be
due to:
a. Data specification being inadequate and inconsistent with respect to the objectives
of the survey.
b. Error due to location of the units and actual measurement of the characteristics,
errors in recording the measurements, errors in recording the measurements, errors due to illdesigned questionnaire, etc.,
c. Lack of trained and qualified investigators and lack of adequate supervisory staff.
ii.

Response Errors : These errors are introduced as a result of the responses

furnished by the respondents and may be due to any of the following reasons:
a.

Response errors may be accidental : for example, the respondent may


misunderstand a particular question and accordingly furnish improper information
un-intentionally.

b.

Prestige bias : An appeal to the pride or prestige of person interviewed may


introduce yet another kind of bias, called prestige bias by virtue of which he may
upgrade his education, intelligence quotient, occupation, income, etc., or
downgrade his age thus resulting in wrong answers.

C.'

Self-Interest: quite often, in order to safeguard one's self interest, one may give
incorrect irformation, e.g. a person may give an underestimate of his salary or
production and an over - statement of his expenses or requirements etc.,

d.

Bias due to interviewer: Sometimes the interviewer may affect the accuracy of the
response by the way he asks questions or records them. The information obtained
on suggestions from the interviewer is very likely to be influerced by interviewer's
beliefs and prejudices.

e.

Failure of respondent's memory: one source of error which is common to most of


the methods of collecting information's is that of 'recall'. Many of the questions in
surveys refer to happenings or conditions in the past and there is a problem both
of remembering the event and associating it with the correct time period.

3.

Non-response Bias: Non-response biases occur if full information is not obtained

on all the sampling units. In house-te-house survey, non-response usually results if the

THEORY OF SAMPLING

respondent is not found at home even after repeated calls, or if he is unable to furnish the
information on all the questions or if he refuses to answer certain questions. Therefore, some
bias is introduced as a consequence of the exclusion of a section of the population with certain
peculiar characteristics, due to non - response.
4.

Errors in Coverage: If the objectives of the survey are not precisely stated in

clear cut terms this may result in (i) the inclusion in the survey of certain units which are not to
be included, or (ii) the exclusion of certain units which were to be included in the survey under
the objectives. For example, in a ce'lSUS to determine the number of individuals in the age
group, say, 20 years to 50 years more or less serious errors may occur in deciding whom to
enumerate unless particular community or area is not specified and also the time at which the
age is to be specified.
5.

Compiling Errors: Various operations of data processing such as editing and

coding of the responses, punching of cards, tabulation and summarizing the original
observations made in the survey are a potential source of error. Compilation errors are subject
to control through verification, consistency check, etc.
6.

Publication Errors: The errors committed during presentation and printing of

tabulated results are basically due to two sources. The first refers to the mechanics of
publication - the proofing error and the like. The other, which is of more serious nature, lies in
the failure of the survey organization to point out the limitations of the statistics.
Remarks: 1. In a sample survey, non-sampling errors may also arise due to defective frame
and faculty selection of sampling units.
2.lt is obvious that the non-sampling errors are likely to be more serious in a complete
census as compared to sample survey since in a sample survey the non- sampling errors can
be reduced to a greater extent by employing qualified, trained and experienced personnel,
better supervision and better equipment for processing and analyzing relatively smaller data as
compared to a complete census.
It has already been pointed out that usually sampling error decreases with increase in
sample size. On the other hand, as the sample size increases, the non-sampling error tends to
\ increase. Accordingly as sample size increases, the behaviour of non-sampling error is likely to
be opposite to that of sampling error.
3.Quite often, the non-sampling error in a complete census is greater than both the
sampling and non-sampling errors taken together in a sample survey. Obviously in such
situations sample survey is to be preferred to complete enumeration survey.

THEORY OF SAMPLING

Advantage of sampling: .
&

It saves time. Since only part of the population is studied, time taken is less not only in
collecting the data but also in processing the data.

&

It saves cost. The amount of labour and expenses involved will be less for part of the
population.

&

It provides greater accuracy. Since only limited number of observations are involved, it
provides greater accuracy.

&

It provides more detailed information. Since we deal with only few observations, intensive
detairs can be collected.

&

Sometimes sampling is the only method available. When the population is infinite or if the
articles are to be destroyed for testing the quality, census method can't be carried out.

Disadvantages:
&

We have only estimate of population parameter which may differ from the actual value of
the parameter but it is possible to calculate the sampling error or standard error which is

' '"-

given along with the estimate.


Standard error/Sampling error .
&

&

It is the difference between parameter and statistics.

\
It is the Standard Deviation of the sample means. If aNy one ...... ,is studied, ..,
I

'

standard error is SD where n is the size of the sample

~
Estimate: It is the value of population parameter obtained from the sample.
Estimator: one which is used to estimate the value of the population parameter.
Two types of estimate.
&

Point estimate - It is a single value which is used to estimate population parameter

&

Interval estimate-It is an estimate in which population parameter lies between two values.
Notations used

Population

Sample
Characteristic values

Yl, Y2, Y3 .. . . ... ...


.

Yl, Y2, Y3 .............

Size

Mean

Standard Deviation

THEORY OF SAMPLING

Y = estimate of population mean


Some properties of estimator:
~

Unbiased estimator: If Y is the population parameter and . Yl, y2 .... are it's estimates and
the mean of all possible estimates is equal to Y, then the estimator
unbiased.

i.e. an estimator

/I

is said to be

Y
/I

is unbiased if E

[~l

=Y

'?s.. Consistent estimator: When sample size n is increased indefinitely, the probability of th~

estimate
/I

to

be

close

be

close

to

1 when

Y -. Y )

N'

approaches

as n-.N

Minimum variance estimator: If there are two estimators

/I

/I

Y1
!

In'

/I

(i.e)P(
~

to Y will

~ will be a minimum variance estimator, if


Yl

v[ ~ J is< V[ ~ J
Y1

Y2
t..

.!~

Y2

/I

for any other estimators Y 2


&

Best estimator: If an estimator is unbiased and consistent, it is called best estimator.

&

Linear estimator:
/I

Y is a linear estimator, if it can be written as a linear function of sample obseNations.


/I

i.e if Yl, y2 ...... are sample obseNations, Y= kl Yl + k2 Y2 + kJY3 + ........ + knYn where kl,
k2 ....... kn are constants
&

Efficient estimator: unbiased, minimum variance estimator is called efficient estimator

Uses of Standard Error:


&

Used as an instrument / basis in testing the hypothesis. Standard error of mean indicates
the average variations in sample means from the population mean.

The magnitude of standard error gives us an idea about the unreliability of the sample. The
greater the standard error are greater the departure of actual value from the expected one
and hence greater the unreliability of sample. The reciprocal of standard error is taken as a
measure of reliability or precision of sample.

THEORY OF SAMPLING

Precision = _1_
SE
where SE

SO

~
as n increases SE decreases and so precision increases.
The more the replication, more will be precision.
(S._

With the help of standard error we can determine the limits within which

.:poptIlation

parameter are expected to lie.


In large samples, sampling distribution of statistics approximate$ normal diltribution
\

Different types of sampling:


I.

Random sampling or probability sampling :


a. Unrestricted random sampling or Simple random sampling
b. Restricted random sampling.
i.

Stratified random sampling

ii.

Systematic random sampling

iii. Cluster sampling


iv. Multistage sampling

II.

'1'

Non random sampling or non-Jirobabilky sampling


1, Judgment sampling

2, Convenient sampling

3, Quota sampling

Simple random sampling

,'

Technique of drawing sample in such a way that each unit of population has an equal

'and independent chance of being included in the sample.


If N is the population size and n is sample size, the probability that anyone of N units
in population to be considered as a unit in the sample is 1IN ,
The total number of samples of size In' that can be selected from the population of size
IN' is NC n
The sampling can be done by following two methods.
1, Lottery method - common method

2. By using random number table.

THEORY OF SAMPLING

i"

lottery method :
This is a very popular method of taking a random sample and in this method all the
items of universe are numbered or named on separate slips of papers of identical shape or
size. These slips are folded in the same manner and dropped in a container or drum. They are
shuffled well. One slip after another is taken till the required sample size is obtained. The
number in slips must constitute the sample. Thus the selection of items depend entirely on
chance.
The lottery method is quite cumbersome when the size of the population is large.
2. .,: Using table of random numbers
The alternative method of random selection is using table of random numbers. The
following are some of the table of random numbers available.
1.

Tippets random numbers table conSisting of 41,600 random units grouped into 10,400
sets of four digited random numbers,
,

2.

._ Fisher and Yates random number table 15,000 random units assigned into 1500 sets

of two digited random numbers

3.

Kendall and Smith random numbers table having 10 lakh random digits grouped into
25,000 sets of four digited random numbers.
I
I

4.

}\ Rand corporation table of random numbers consisting of 1 lakh random number digits
grouped into 20,000 sets of five digited random numbers.

5.

CR Rao, Mitra and Matthai table of random numbers consists of 20,000 random digits
-;,.,

>

grouped into 5,000 sets of five digited random numbers.

Eg. It the population size is N = 30 and sample size n = 15.First number the population

from 1 to 30. Since 30 is a two digit number choose a two digit random number table. The
maximum multiple of 30 which is also a two digit number is 90.ln the two digit random number
table, choose 01 to 90 rejecting 91-99 and 00 so as to give equal chance to all units. Have a
random start in two digits random number table. Start anywhere, select that number if the
number is less than 30,if it is greater than 30,divide by 30 and take the remainder. If the
remainder is 0, that corresponds to 30, we have to repeat the process till we get 15 different
numbers (i.e. required sample size).The units corresponding to the chosen number constitute
the sample.
",
Estimation of point estimate and confidence interval for population mean
If Y1, Y2 ...... yn are the sample values chosen from the population of N units, then the
mean of the sample
y=

y'ty2 t ....... t yn
n

THEORY OF SAMPLING

, b"!

i.e y '. J!' ".'

L yi

'j

':; .

i=l

n
Sample mean, y will be taken as estimate of population mean
1\
'

!T!,!,

..

'"

.'

."

-=

To estimate confid~.1 we ~ ~p:J QUi ~j~ot"' _ _ of.f'>Pulation


.. . ,"
mean
1\

Estimation of standard error of population mean[S.E( Y) 1


1\

S,E( Y) is given by
-,\.,

1\

~~~~a~':~

S,E. ( Y)

=
n
n-1

When n> 30, 95% confidence interval for population mean Y is given by

Y 1.96 S.E. (9)

Y+ 1.96

I.e.

Jt; r:

99% confidence interval is

Y + 2,58

For small samples, if n < 30, then, 95% confidence interval for population mean Y is given by

Y+

t(n.l)

(5%) S,E, ( Y)

10

THEORY OF SAMPLING

99% confidence inteMJfOrJopulation mean Yisgiven by

Y+

t(n-1)

(1%) S_E_ ( Y)

Stratified random sampling (St.R.S)


In this population of size 'N' is subdivided into a definite number of non overlapping
and distinct sub population of sizes N1, N2 __ .. Nk such that N1 + N2 + N3 + ....... + Nk = N This
procedure of dividing population into distinct sub population is called stratification and each sub
population is called a stratum. While forming a stratum we take into consideration that the units
within each stratum will be more uniform with regard to the character under study. Between
stratum, there will be greater diversity or variability. It is with the idea of improving preciSion of
estimate within each stratum of size 'Ni', a simple random sample of ni will be drawn such that
n1 + nz + _..... + nk = n while "n' is the size of the sample. Such a sampling method is called
stratified random sampling.
Situations which lead to adoption of St.R.S
&

When there is reaS0n to believe that the character under investigation is highly variatM. ,'.,

is._

When estimates are required for sub population also along with the population
------

Characteristics value

Sample

Population

Notations used

Y11, Y12,.

Y11, Y12,

Vii, ph unit in ilh stratum

Yii, jlh unit in ilh stratum

Advantage of Stratified random sampling


1.

It is more representative:
In simple random sample, some stratum may be over repeated while some others may

be under represented or may be excluded. In stratified random sample, as desired


representative are taken in different stratums, all stratums are equally represented. It
overcomes the possibility of any essential group of population being completely excluded from
sample. Stratified random sample thus provides a more representative cross section of
population and is frequently regarded as most efficient system of sampling.
2.

Greater accuracy:
Stratified random sample provides estimate with increased preCISion, Moreover

stratified random sample enables us to obtain results of known precision for each of the
stratum.

11

THEORY OF SAMPLING

3.

Administrative convenience:
As compared with simple random sample, the stratified random sample would be

concentrated geographically. Accordingly, time and money involved in collecting the data and
interviewing the individuals will be considerably reduced and supervision of field work could be
allotted with greater convenience.
Estimate of population mean and its confidence interval

y;.....

~,

Let

~ denote the sample mean of 1st stratum, 2nd stratum, ....... kth

stratum. Then Yl is the unbiased estimate of 1SI stratum population mean y~ is unbiased
estimate of 2nd stratum population mean and so on. In general Yi is the unbiased estimate of ith
stratum population mean
Therefore estimate of YI

Yi

=Sum of char~ value of aJl thf samp" YlJu!s 2! 111_


<'

/I

'lil + 'Ii? + ...... + Yini

=
.

'

"'.

ni

i=1

Estimate of population mean in St.RS


is given by we;ghted mean of yi' S with

wei~1fVN

,.,

YStRS

(NUN) is called stratum weight


=

(i.e)

Nl Yl + N2 Y2 + ..... + Nk Yk
N

YSt.RS

N1 +

N2 +

+ Nk

V2 + ..... + Nk Yl

= Nl VI + N2

N
k

Ni yi

i=1

12

~::,':

.'

,.;>;'

THEORY OF SAMPLING

Estimate of.

Standfd ~QI' of
YSt.RS
where

Sj2

is the estimated variance of ith .

stratum

When the sample is large (i.e) size> 30, ,


1\

95% Contidence interval tor

99% Confidence interval for

YSt.RS

St.R~ 1.96

is given by Y

Yslit . is given by
, I

1\

SE [

YSt.RS)

2.58 SE

YSt.RS

If sample size is < 30,95% Confidence Interval is given by,


1\

tn-1

(5%)

SE

YStRS

99% Confidence interval for

1\

YStRS

is given by

tn-1 (1%) SE

1\

YStRS

Systematic sampling
It is a type of simple random sample_ It is a commonly employed technique if the
complete and upto date list of sampling units is available. This consists in selecting only the
first unit at random, the rest being selected according to some predetermined pattern involving
regular spacing of units. Let us suppose that 'N' population units are serially numbered from 1
to N in some order and a sample of size In' is to be chosen, then let k = N/n, this k is called
sampling interval or sampling ratio. ,Have a random start by choosing a number less than k at
, random let it be p, then choose every kth item p, p + k, p + 2k, ........ and the units corresponding
to this numbers are the chosen sample.

Merits
~

It is more convenient than simple random sample, stratified random sample as time &work
involved is relatively less.

'&

Systematic sample will be more efficient than simple random sample provided the list from
which sample units are drawn is known.

13

THEORY OF SAMPLING

Demerits
~

They are not in general a random sample

If N is not a multiple of n, actual sample size will be different from that of required size.

Sample mean is not an unbiased estimate of population mean

It may lead highly biased estimates if there are periodic features associated with sampling
interval i.e. if the list has a periodic features and sampling interval is equal to or multiple of
that period.

Cluster sampling
In this case, total population is divided into some recognizable subdivisions termed as
clusters depending on the problem under study, and simple random sample of these clusters is
drawn. We can observe each and every unit in the selected clusters which is our sample. For
example, we are interested in obtaining income of a city, whole city is divided into N different
blocks or localities and simple random sample of blocks is drawn. The individuals in the
selected blocks determine the cluster sample.
Multistage sampling
Instead of enumerating all sample units in the selected clusters, one can obtain better
and more efficient estimators by resorting to sub sampling within the clusters. This technology
is called two stage sampling, clusters being termed as first stage sampling unit. This
technology is called multistage sampling.
For example, if we want to study consumption pattern of households in Tamil Nadu,
whole Tamil Nadu will be divided into different districts which is first stage sampling. A simple
random sample of few districts will be selected. Selected districts will be divided into villages
from which a simple random sample of villages is selected which is (ailed second stage
sampling units. The selected villages will be divided into households and few households will
be selected which is third stage sampling and here we get desired sampling units.
In every stage, some districts, some villages are left out. This is the disadvantage.
Non random sampling method:
1.

Purposive or deliberate or subjective or judgement sampling


Here the investigator takes the samples exclusively at his discretion.

2.

Convenient sampling:
If the investigator chooses the samples at his convenience, it is called conv9nient

sampling.

14

THEORY OF SAMPLING

3.

.Quota sampling:
It is a type of judgement sampling wherein quotas are setup according to some

specified characteristics such as 'this much in the group', 'this much in other group' and so on.
Estimate of proportion and SE of proportion
1.

Simple Random Sampling


Sometimes the data under question will be qualitative in nature, wherein we get only

proportion or percentages. If p is the proportion of a desired attribute in a sample, then the


estimate of the proportion for the population P will be given by the corresponding sample
proportion.

Estimate of SE of
SE (P) =

N_: [pqJ '

. I

,,

.'

n-1

where q= 1-p
n is the sample size
N is the population size

Conf~ence Interval for proportion

If n< 30 95% Confidence Interval for the population proportion is given by

(5%) SE (P)

(n-l) d.t

I,

:;1

, 99% Confidence Interval is given by


Pt (1%)SE(P)
(n-l) d.t

H' n > 30,95% Confidence Interval is given by

P . 1.96 SE (P)
A

P . 2.58 SE (P) where

99% Confidence Interval is given by


A

SE (P) =

j N~n [~]..

where q= 10p

15

THEORY OF SAMPLING

2.

Stratified Random Sampling

If Pi is the proportion of sample in jlh stratum, then Pi is an unbiased estimate of Pi (ie)


jlh

stratum proportion of the population.


1\

~ ~ Pi where Wi is the stratum weight ~f;:i~ ~tratum

Then PSt.RS

Estimate of standard error of proportion in stratified random sample (SE (PSt.'" )


1\

SE (P SI.RS)

=
\

\... Ni

I.....

ni - 1 ./

If n<30, 95% Confidence Interval is given by,


1\

1\

PSt.RS

+t

(5%) SE (PSlAS)

0-1

99% Confidence Interval is PSt.RS t n-1 (1%) SE (PSt.RS)


1\

If n > 50,

1\

PSt.RS 1.96 SE (PSt.RS)

95% Confidence Interval is


1\

1\

99% Confidence Interval is PSt.RS 2.58 SE (PSt.RS) where,


1\

SE (P stM)

Lwi2

[Ni~J [~J
NI

nl-

16

TESTS OF SIGNIFICANCE

TESTS OF SIGNIFICANCE
It is a statistical procedure followed to test the significant difference between statistics
and the parameter or between any two statistics. i.e. between sample mean and population
mean or between two sample means.
Hypothesis: Any statement made about the population
Null hypothesis: There is no significant difference between statistics and

parameter or between any two statistics. It is usually denoted by Ho


Alternate hypothesis: Statement contrary to null hypothesis is alternate

hypothesis and is denoted by H1.


Null hypothesis is never proved. It is either accepted or rejected at some level of
significance. Usually we will have two levels of significance.

Test Statistic: The Test statistic is some statistic that may be computed from the data
of the sample. It is to be pointed out that the key to the statistical inference is the sampling
distribution of the relevant statistic. General formula for a test statistic that will be applicable in
many of the hypothesis test will be as follows:
Test statistic =

relevant statistic-hypothetical parameter


s.e. of the relevant statistic

Decision Rule: All possible values that the test statistic can assume are points on the
horizontal axis of the graph of the distribution of the test statistic and are divided into two
groups. One group constitutes what is known as the rejection region and the other group
makes up the unrejection region. The decision rule tells us to reject the null hypothesis if the
value of the test statistic that we compute from our sample is one of the values in the rejection
region and to not reject the Ho if the computed value of the test statistic is one of the values in
the unrejection region.

Significance Level: The decision as to which values go into the rejection region and
which ones go into the non rejection region is made on the basis of the desired level of
significance, designated by

Ct..

The term level of significance reflects the fact that hypothesis

tests are sometimes called significance tests, and a computed value of the test statistic that
falls in the rejection region is said to be significant. The level of significance

Ct.,

specifies the

area under the curve of the distribution of the test statistic that is above the values on the
horizontal axis constituting the rejection region.
The level of significance Ct. is a probability and, in fact, is the probability of rejecting true
null hypothesis. We select a small value of Ct. in order to make the probability of rejecting a true
null hypothesis small. The more frequently encountered values of Ct. are 0.01 and 0.05.

17

TESTS OF SIGNIFICANCE

Calculation of the test statistic: From the data contained In the sample, we compute

a value of the test statistic and compare it with reiection and non-reiectlon regions that have
already been specified.
Statistical Decision: The statistical decision cons'lsts ot rejecting or of not rejecting

the null hypothesis. It is rejected it the computed value ot the test statistic falls in the rejection
region and it is not rejected if the computed value of the test statistic falls in the non rejection
region.
"1

(\. .". ,.
~=mmd,!._.:._ .\&mm
.". ''',

1%

1%

-;

--..-----I~~
Rtttct'OI1

(~lQ<\

NorHelt'tlofl

RttI'OI"!

'''9'Qrl

95

64~

~
-I

-v-----l ' - - - - - - , . - - - - _

regIon

flgure(\i)

Rejection and nonretecllon regions

Degrees of freedom:
Degrees of freedom of a statistic are the number ot independent observations in the
sample (n) minus the number of parameters (k), which must be estimated from the sample
observations. (Or)
The number of independent comparisons in the sample observations. (or) the number
of values that are can choose freely.
Different tests of significance:
Non parametric test

Parametric test

A distribution will be attached to the test. The various

1t will be free from a distribution and

parametric tests are

is called distribution free method

1.

Normal deviate test or Large sample test or Z test

2.

Students' t test or Small sample test

3.

Chi-square test

4..

F-test or Variance Ratio test

In any test, we take anyone of the following four type of decisions


1.

Null hypothesis may be true but our test rejects it - which is Type I error.

2.

Null hypothesis may be false but our test accepts it- which is Type \I error

3.

Null hypothesis may be true and our test accepts it- which is correct decision.

4.

Null hypothesis may be false and our test rejects it -which is correct decision.

TESTS OF SIGNIFICANCE

Statistical procedure followed in any test of slgnlftoance


Step 1 :

Forming null hypothesis

Step 2 :

Calculation of test statistics

Test statistics

1\ ::'.,;:f'

Statistics - parameter
SE of difference
or

difference in the value of two statistics


SE of pifference
Step3: Conclusion
Depending on the value of test statistics, we either accept or reject Ho.
LARGE SAMPLE TEST

This is carried out when sample size is large i.e. n > 30.
When n > 30, it will follow normal distribution whose equa~.1
_(x.m)2
:::

l~

_lL

,12TTO' 2

.1

I 20' 2

where m
\

::: mean

0'::: standard deviation

, To test the significant difference between sample mean and population mean

I)

_ Step 1 t,

Ho: 1here is no significant difference between sample mean and population mean.
/ Step 2
Test statistics or ' Z ' statistics is given by
Z

::: Difference in mean of sample &population! Standard error of difference

:::

x- m
S.E. ( x- m)

:::

x-m
_0_

x, m are the mean of the sample &


population respectively.
0' - Standard deviation of population

19

n - Size of sample (Or)

TESTS OF SIGNIFICANCE

x- m

where 's standard deviation of sample

Step 3 :

Conclusion

i) If

I Z I < 1.96, Z is not significant and we denote this as Z =( )N.S and Ho is accepted

ii) If

I Z I > 1.96 , Z is significant at 5% level and we denote it as Z = (

iii) If

I Z I > 2.58 then Z is highly significant or significant at 1% level and we denote it as

) * and Ho is rejected

Z = ( )** and Ho is rejected


Example:
A sample of 300 broilers of 7 weeks old is taken from a farm .The mean weight of the
sample is found to be 1.95 kgs. with standard deviation of 0.24 kgs. Can it be said that the
sample is taken from a population of mean 2.1 kgs?
Step 1

H() : There is no significant difference between sample mean and population mean
Step 2 :

x-m

2.1-1.95

0.24

>

',"

2.1 Kgs.

1.95 Kgs.

0.24

,;:._.

300

...

11.53'"

~
Step 3

I z I > 2.58. Z is highly significant or Z is significant at 1% level.


Ho is rejected
I.e. The sample is different from the population i.e We can't say that the sample is
taken from the said population.
To test significant difference between the mean of two samples

II)
Step 1

Ho: There is no significant difference between the mean of two samples

TESTS OF SIGNIFICANCE

Step 2

Difference in the mean of two samples


Standard error of the difference

is.E.( ~ -

iz)

O'l f + 0'2f
nl

Where

n2

Xl & X2 are mean of samples of sizes

n,

.-.,_...from the two

populations with standard deviation of 0'1 & 0'2 respectively.


(Or)

XI - X2
~1

Z=

where SI&S2 are standard deviation


of samples if population standard
deviations are not known.

$tep3
- 'I

I Z I < 1.96, Z is not significant and we denote Z =( ) N.S. Ho is accepted

i)

ii) I Z I > 1.96, Z is significant at 5% level. and we denote Z = ( )* Ho is rejected


iii) I Z I > 2.58, Z is highly significant or significant at 1% level. and we denote Z =

( )** and

Ho is rejected

If samples are taken from the same population, then the test statistics is given by

If 0' is not known, replace 0' 2by nls1 2 + n2s22then

,:

.\

Example :-. --i~-;


A random sample of 35 and 45 broilers are taken from a farm and their mean weight in
kgs.are 1.7 and 2.1. respectively with standard. deviation 0.72 & 0.56 Kgs.
significance of these two samples.
Step 1 : There is no significant difference between two sample means,

21

Test the

TESTS OF SIGNIFICANCE

Step 2 :

..fu.2 +_g2

0.72)2
45

IZI

=- 2.810

1.7 - 2.1

(0.558)2

35

(2.810r

Step 3:
I Z I > = 2.810 > 2.58
Z is highly significant. Ho is rejected
i.e. The two samples are different.
III)

To test the significant difference for an observed proportion

Step1 : Ho :There is no significant difference between observed proportion

and expected

proportion.
Step2 : Test Statistics is given by
difference in observed and expected proportions

S.E. of the difference


P-p
Test Statistics is given by Z
= S.E. ( P-p)
=

E..::_Q

Step3 : Conclusion :same as before

Where
Q= 1-P
n = no of trials

Example:

In a farm 120 calves were born in a year. Out of which 73 are female. Test the
hypothesis that the sexes are born in equal proportion.
i)

Ho: There is no significant difference between observed and expected proportions.

ii)

P =

73
120 =

0.6083

0.6083 - 0.5000
0.6083 x 0.3916
120

P -P
Z

0.50

Conclusion

I Z I > 1.96
I Z I is significant.
Ho is rejected

22

= 2.430'

TESTS OF SIGNIFICANCE

Hence there is significant difference


born in ec!ual proportion.

iv)

,-0 test the significant difference between two proportions

Step 1

Ho: There is no significance difference between two proportions .


Step 2
Test Statistics :
=

difference between two proportions


S.E. of difference between two proportionS

when the proportions are from the


different population

P1 01 + P2 02

S.E. (Pl - P2 )
./

P1 and pz are the two proportions

and
02 = 1-P2
No of trials.
where P= n1Pl+n2P2 when the proportions
n1+n2 from same population

,-
t.r

.",

a",

~-=.)-?

Step 3 : Conclusion same as before


Example: There are 16 diseased animals in a farm out of 500. After the treatment the diseased
animals is 3 out of 100. Is the treatment effective?

Ho: There is no significant difference between the two proportions.


Test Statistics:

z
=

P1 = 16/500 = 0.032 Q1::: 0.968


P2 = 3/100
= 0.03 02::: 0.97
P= 19/600=0.317 0~0.9683

:z =

0.032 - 0.03
=.0147N.S
0.317x.96836(1/500+ 1/100)

I Z 1< 1.96
Z is not significant. Ho is accepted.
The treatment is not effective in contrOlling the disease.

23

~ OF SIGNIFICANCE

0'-.""",..

v)
To test the significance of an
'1...., coefficient.
Step1
Ho:lhere is no significant correlation coeificient.
Step2: Test S(a((s(rcs is given by
r-p
Z = S.E(r-p) where r is the sample correlation coetticiInt ar)d p lithe popufation
correlation coefficient. ~ . ,:
"" . '. '" "
' ","
'
r-p

z=

(1_p2) / ~n

Take p=O as we are interested in knowing whether 'r' is different from zero.
roO
then Z = (1-0)/ ~n

-1 ~n

::: r ~ n

Step 3 : Conclusion: same as before


~j,

--

~ (~

:-

Tables are available showing the critical value~ of r abovtJ ~ Ile given r is I
significant or not We have to see d.f:::n-2

Not~:

i:

SMALL SAMPLE TESTS (OR) STUDENTS' 't' TEST

This is the test to be carried out when the sample l:lize is less than 30.
is due to GOSSEn & he has published it in the pen name STUDENT

This distribution

':."

f (t)

I)

To test the significant difference between sample rl,ean anet


population mean.

where C is a Constant

S!fp 1
Ho: There is no significant difference between sample and populatiof'} mean ..

TESTS OF SIGNIFICANCE

Step 2
Test Statistics or' t ' statistics

difference between mean of sample and Population


S. E. of difference

x- m
:: S.E. (x- m)

x-m
cr

::

with degrees of freedom = n-1

F
m
x
cr
n

Population mean
Sample mean
S.D. of population
Sample size

If population s.d. is not known, '0' ' can be replaced by's'


/

,,

Step 3 Conclusion:
i) . U

If I t I < Table't' at 5%level with d.f= n-1, t is not significant t :: ( ) N S. Ho is accepted.

ii)

If I t I .> Table' l' at 5%level with d.f= n-1, t is significant.t = (

ii)

If I t I >Table' l' at 1%Ievel with d.f = n-1, t is highly significant. t=( ) .. Ho is rejected ..

II)

To test the significant difference between two sample meanl


when the samples are independent (Or) Non-paired 't' test

r Ho is rejected.

By independence of samples, we mean that observations of one sample is in no wI}


related to observations of other.
Step 1
Ho :There is no signincant dit1erence between two samples or sample means.
Step2
Test Statistics
difference in mean of two samples
S.D. of difference

::
Xl -

X2

= S.E( X1 - X2)

sf 1.+11
l n, n~

( n, -

1 )S,2 + ( n2 - 1 ) S22

n, + n2 - 2
X1

X2

are sample means with s.d's

S1

S2

and of sizes n1 , n2 respectively

Step 3 : Conclusion same as before with respective d.t.

25

TESTS OF SIGNIFICANCE

Note:

when

n1 =n2, Le. sizes of sample are equal \


with

d.f=2(n-1)_.

Example:

In a certain experiment to compare two types of pig diets for which the following results
of increase in weight were observed during the experimental period. Can we conclude that diet
1 is better than diet 2.
Diet 1 : 52 53
57
52
49
52
50
Diet 2 : 46 51
42
50
55
54
Ho - There is no signif~ant ~ifference between two diets

X1 = 52.142 xF49.66

=
S1 =2.54

S2 =4.92

.~

6 X2.542 + 5 X4.922
::::
6+72

l,

1,170

=14.54

52.142 - 49.66
1:1 ,'.

::

14.54

7
i) r~:.

t 1 < 2.201

Hence t is not significant 1.1 1= ( 1.170 )N.S. Ho is accepted.


There is no significant difference between the diets.

III)

.am.,...

To test the significant difference between two dependent


(Paired' t' test)
\\
,..... " : . 'f_
Dependence means the observations of two samples have a common faCk>r linkklo the

two observations.
Step 1
Ho:There is no significant diffetlnce between two samples.
Step 2
Test Statistics
i)

_d_
=S.E. ( d) where d is the difference in the observations of two samples,

TESTS OF SIGNIFICANCE

with d.f= n-1

...L

d is the mean of d.
s is the s.d of d

Step 3

Conclusion : Same as previous with respective d.f.


Example:
In a study on effect of new ration to increase maximum rate of lean tissue growth in
steers, the following data was obtained. Can you conclude that the ration is responsible for
increase in rate of lean tissue growth.
Steer
1
2
3
4
5
6
7
8

R
a et0f Iean rIssue growth
Before giving Ration After giving Ration
420
600
490
720
340
350
560
530
670
850
540
580
530
690
590
730

Step 1 : Ho:There is no significant difference between two stages


Step 2
d 180 230 10 30 180 40 160 140
n = 8 mean of d ( d) = 121.25
s.d of d (s) =60.47

_d_
121.25
s = 60.47 = 3.4313". d.f= 7

-rS

It I > Tab t at 1% level


It I is highly significant

He is rejected
The ration is responsible for increase in rate of lean tissue growth.

IV)

To test the significant difference of an observed correlation


coefficient

Step 1 : He

There is no significant correlation coefficient

27

TESTS OF SIGNIFICANCE

Stap 2 .

r- 0
S.E. (r)

~n-2

Stap 3 : Conclusion: Same as previous with respective d.f.


Note: If' t ' is significant, it implies that correlation is significant or the
related.
.
EJlample

two variables arQ;


'.

The coefficient of correlation of the body weight &feed in take of 15 broilers was 0.75
test the significance of the correlation.

Ho:

There is no significant correlation coefficient

O.7~
=

= 4.088 with d.!= n-2

~ 1 - 0.75 2

= 13

I t I> 3.012
I t I is highly significant
~ \ >-::. \ l,..<J~~>** \.~.

\ -::. \ <J.1S >**


is
rejected
Ttlus the correlation coefficient is highly significant.

Ho

V)

To test the significance of re~ssiOft;. ooe~ 1ft; 'IIneat


regression of the form
\

y =a + bx

Ho : There is no significant regressioncoefficient.


Stf3p 2

b-O
t

S.E. ( b - 0)

_Q
a.E. (b)

Sxx = LX2 -(Lxf/ n

Syy - Sx{-I Sxx

S,. (o)

where
(n - 2) Sxx

sxx = Ly2 -(Ly)2j n

~r.. Si

,
J

'Sxy

LXY -(LX)(LY) / n
with d.f=n-2

Stl3p 3: Conclusion
same as previous with respective d.f.

28

TESTS OF SIGNIFICANCE

CHI- SQUARE TEST


The probability density function of X2 with d.f= 'k' is given by
!5_-1
- X.

e
r(k/2)

prQ~erties of X 2
1.

2.
3.
Note:

If X1 2, X 22, X 32....... Xn 2 are i'ldependent X2 variable with d.f. k1,k2,k3 ..... kn, then the
sum o~ X2 variables i.e.) X,2+X 22+X 32+....... +Xn2 will be a X 2 variable with d.f =k, + k2
+........ +kn
If Z is a standard normal variable Z2 is a X2 random variable with d.f =1
Combining 1 & 2 ,if Z1,Z2 ...... Zn are In' standard. normal variables, then
Z12+Z22+ ....... +Zn2 will be a X2 variable with d.f. = n
If X is normal variable with mean 'm' &S.D. '(} I, then the standard normal variable is

Z =X-m
(}

4.

If X1,X2,X3 ............ Xn are In' independent normal variables with mean m',m2 ....... mn
& Variance 0,2, (}22 ..... (}n 2then Ii (Xi-mi)2 will be a X2 variable with d. f = n.
(}j2

5. .

If X1,X2 .............. Xn is a random sample from a normal population with mean m and
variance (} 2then In (Xi-m)2 is distributed as X2 with d.f =n
i=1

6.

(}2

If X1,X2 ........ Xn is a random sample from a normal population. With mean 'm' &
variance (}~hen the sample mean X & sample standard deviation (} are independent
& In (Xj- X)2 is distributed as X2 with d.f = n-1
j=1

(}2

)1ses Of Chi - Square Test


.'

The purpose of this test is to determine if the observed set of data fits an expected set
of data. This test will help us to determine whether the sample results are consistent with the
hypothesis that they were drawn from a population with the known distribution i.e. uniform
distribution, normal distribution, Poisson or normal distribution.
The uniform distribution is continuous distribution which assumes that all possible
values are equally likely. It is also used to test whether the observed frequencies are in a given
ratio. In genetic experiments, we used to test whether the values are according to Mendalian's
ratio (9:3:3: 1)

Chi - Square Test of goodness of Fit


Step 1 :

Ho : Fit is good or there is no significant difference between the observed and


expected frequency.
-

29

TESTS OF SIGNIFICANCE

Step 2
Test statistics

with d.f=no of classes -1


0- Observed value
E - Expected value

Step 3 : Conclusion
If IX2 I < Table'x2 'at 5%level for the respective d.f, X2 is not significant X2 = ( ) N S. Ho is
accepted.
~ If IX2 I .> Table 'X2 ' at 5%level for the respective d.f" X2 tis significant. 'X? = (
Ho is
rejected.
(:s. IX21 >Table 'X2 ' 'at 1%level for the respective d.f, X2 is highly significant.x2 =(
) .. Ho is
rejected.

&

r.

Example
In an experiment to determine the preference of various rations, a random sample of

60 steers were introduced to 5 different rations. The animals have free access to 5 different
rations. The preference for a ration was measured by number of animals eating from a
particular bin at anyone feeding time. The observed results are displayed below.
Ration

No of steers

13

14

11

13

Test the hypothesis that the rations are equally preferred.

Ho :The fit is good.


(or)
There is no significance
expected frequency or rations are equally preferred.
=

L (O-E )2

di~ ~n obs~rved and

with d.f = ( 5-1 ) = 4

E
=

Calculated X2

(1.32)NS
< Tab.x2 at 5% level

Hence X2 is not significant


Ho: is accepted.(Le) Fit is good
Hence all rations are generally preferred.

Note: If anyone of values are less than 5, we have to group them and then" flave to do'
the test.

X2 test of independence
In this test our interest is focussed on independence between classified item according
to the two different criteria by rows and columns.( i.e.) to test the hypothesis that the
classification represented by rows and columns are independent or associated.This test is
referred to as the Chi-square test of Independence or Contingency test.
In case of pair of variables, the association can be studied by correlation coefficient.
But in case of attributes, we have to classify according to the contingency table and test for

30'

TESTS OF SIGNIFICANCE

independence. When the results are classified according to different criteria in a r x c table in
the following manner.

2
0 12
022

- j
- Oli
- - - Oij.
- - -

1
011
021

Oil

Oi2

r
Total

02 1

c
OIC
02c

Total
Rl
R2

Oic

Ri

- Orj.
Rr
0r2
Orc
Orl
N
Cc
C2
- C1
Cl
The d.f is given by ( r -1 ) (c-1 ) i.e( The no of rows - 1 ) x ( No. of columns - 1)

Step 1 : Ho: The factors are independent


Step 2
Test statistics
~~

:::

Eij

:::

(Oij - Eij)2

with d.t.(r..1)(c1)

Expected value in ' i ' th row and jth column

Oij

:::

Observed value in ' i ' th row and jlh column

Eij

:::

Product of row total and column total in which it lies dl'l!ded by Grand Total

Eij

:::

Ri x Cj
N

Step 3 : Conclusion: Same as before with respective degrees of freedom


Note:

If some of the values are less than 5, then we have to use' Yate's correction which is

given by adding 0.5 to the value which is less than 5 and subtracting 0.5 in some other cell to
make the total same. In general if d.f ::: 1 & N is small, Yate's correction is needed. For small
samples, when the expected frequency is between 5 and 10, it is better to compare both the
corrected and uncorrected X2 .If they both lead to a different conclusions, one can either
increase the sample size and if it is impractical to increase the sample size we have to follow
exact method of probability involving multi -nominal distribution.
In th e case 0 f 2 x 2 X'2con t'Ingency ta bl e 0 f th e f0 II oWing
. orm
Factor A
Total
Levell
Levell!
Levell
a
b
atb
Factor B
Level II
c
d
ctd
Total
atbtctd:::n
atc
btd
Step 1: Ho: the factors are Independent
Step 2 : X2 is given by

31

TESTS OF SIGNIFICANCE

(ad-bc)2xn

(Iad - bc I -n 12}2 x n

j.

(a+b) (c+d) (a+c) (b+d)


with df = (2-1) (2-1) = 1
"
corrcltJsion : as in the case of X2 test of goodness of fit (with the d.f.=1).
~ The above formula is applicable when all a, b, C, d are greater than 5. It one or more is
less ttlan 5, Yate's correction of continuity is to be applied which is as follows:

(a+b) (c+d) (a+c) (b+d)


Conclusion: as in previous test (with the ~ d.O
Coefficient of contingency :

It is defined as

~N+X2

~
V~
N

Coefficient of contingency
Where 0 2 =

N Which is mean Square contingency

Let C

Coefficient of

0 2
1+0 2
Contingency

VC

X2 test of homogeneity
It is an extension of X2 test of independence.X2 test of homogeneity is designed to
determine whether two or more independent random samples are drawn from the same
population or from different population. Instead of one sample as we use with independent
problem, we shall now have two or more samples. For example, we may be interested in
finding out whether or not university students of various level (Le) U.G, P.G. & Ph.D. feel the
same with regard to the amount of work required by their professors, i.e. too much work,
correct amount of work & too little work. We shall take the hypothesis that the three samples
come from the same populations. (i.e.) the three classifications are homogenous so far as the
opinion of three different groups of student about the amount of work required by their
professors is concerned. This also means that there exists no difference in opinion among the
three classes of people.
The test statistics used

f~r

test of independence is used for the test of homogeneity.

The test of independence are concerned with the problem of whether one factor is independent

TESTS OF SIGNIFICANCE

of another while test of homogeneity are concerned with whether different samples come from
the same population. Test of independence involves a single sample taken from one
population but the test of homogeneity involves two or more independent samples taken from
each of the possible population in question.
Example: A firm selling four products is interested in finding out whether the sales are
distributed likely among four general classes of customers. A random sample of 370 sales
provides the following information.
Products
1
25
32
35
28
120

Farmers
Factory Workers
Business Men
Professional
Total

2
10
20
48
22
100

4
15
28
10
17
70

3
60
10
25
15
80

What conclusion can you draw from X2 test

Ho: Factors are independent (or) Customers groups are homogeneous.


The expected values in any cell are calculated by the
product of the row and column total in which the element lies
divided by the grand total

= 44.75
Tab X2 at (r-1) (c-1) d.f

(4-1) (4-1 ) =9d.f

(44.75)**

Cal X2 > Tab X2 at 1% level

X2 is highly significant
X2

" I r

Ho is rejected

~ \

--

i.e. customers group are not homogeneous


Thus customers group are heterogeneous

F Test or Variance Ratio test (V.R. Test)


, F ' distribution is of the form

t IF) =
,

In, + n, I
2

l
~

]n1 + n2

+ ~~F

33

Total
80
90
118
82
370

TESTS OF SIGNIFICANCE

, F 'distribution is the ratio of two

Variables .

F is the ratio of two X2 variables(Le) if' U ' & 'V ' are two X2 variables with d.f n1 & nt ..
I

then Ufn, follows' F 'distribution

V/n2

"/

,',

'1'

with d.f= n, &n2

, F 'distribution =

, F ' is used to test significant difference between variances of two samples, or more.
The object of ' F' test is to discover the two independent estimates of population
variances differ significantly or whether the two samples may be regarded as drawn from the
normal population having the same variance.
, F ' Test Procedure
Step 1 : Ho:
Step 2
Test Statistics

Where

81 2 = Estimated variance of first sample


822

= Estimated variance of second sam pi

with d.f= (n1 -1), (n2 -1)


where n1 & n2 are the sample sizes
Step 3
Conclusion:
~ Cal F < Tab F with d.t ( n, - 1), ( n2 - 1 ) at 5% level F is not significant F = ( ) N.S. Ho is
accepted
~ Cal F> Tab F at 5% level with d.f ( n1 - 1), ( n2 - 1 ), F is significant F = ( )*. Ho is rejected
~ Cal F> Tab F with d,f =( n, - 1 ), (n2 -1 ) at 1% level. F is highly Significant
~

F::: ( )** Ho is rejected

Note: We have to take the greater variance of the two samples in the numerator

NUN - PARAMETRIC

TESTS

NON- PARAMETRIC TESTS (OR)


DISTRIBUTION FREE METHODS
The statistical tests that are not concerned with population parameter and are
dependent on rigid assumptions about the population, probability distributions are referred to as
non-parametric tests or distribution free method.
Advantages
~

It is easier to carryout &to understand

The assumptions regarding populations are less restrictive and so are applicable to a wider
range of conditions.

Small samples is enough to obtain exact results

Demerits
~

Less effective when the order or Rank is used

Because of their simplicity, an easy computation using small samples, they are sometimes
used for convenience where large sample &parametric methods may be appropriate.

Computations become burdensome when sample sizes increases.

SITUATIONS WHERE WE USE NON-PARAMETRIC TESTS


~

When the data do not meet the assumption required for a parametric tests.

When we know that the data gathered are from a population which is not normally
distributed, it is appropriate to use a non-parametric tests.

If the question to be answered does not involve a parameter.

To derive a quick &appropriate result.

SIGN TEST

This test is performed, when we wish to analyse two sets of data which are from the
same samples or dependent or occurs in pair. It depends on the sign of the difference within
\ the paired observation, the stepwise procedure is as follows.
Let Xi &Yi are observations of two samples
1.

Examine the pairs of observations in the two samples, (i.e.) (Xi, Vi) i=1, 2........... etc.,
J

2.

<

If Xi > Yi ,then mark +ve sign

3.

If Xi < Yi then mark -ve sign

4.

Xi;:: Yi then discard that pair.

35

NOII-IWfAMETRIC TESTS

5.

Denote the number of pairs remaining in either a +ve or -ve signs by , n '

6.

Denote by 'r' the no. of pairs with less frequent sign occurs.( either tve or -ve)

7.

To test the hypothesis of no difference between the effects of two sets of data,
compare 'r' with the critical values in the table corresponding to degrees of freedom = n

8.

If observed r STab r ,hypothesis is rejected otherwise it is not rejected.

Example: An experiment is conducted to compare the effects of two different rations


on 10 Holsteins, the average milk yield in kgs per cow per day during the period of study are
given below. Do this data provide sufficient evidence to indicate a difference in the two rations.
Ration A

26

24

25

22

18,

30

26 '. 28

Ration B

18

20

24

20

20

20

24

Sign of the

t.

.',

26 ;,
t

26

26

27

24

difference
n = 10

~,

no. of t"e sign= 8 ". .


no. of - ve sign= 2

"<'.,\,

r = 2 (no. of less frequent sign)


For

d.f

=10

tab r = 1

tab r = 1
Cal r > tab r
Hence the hypothesis is not rejected
Hence the two rations are not different.
....

Signed Rank Test


The sign test is simple to apply and we can do it even when the actual measurements
are not available. However when measurements have been obtained, sign test is not the most
effective test available. A better test sometimes referred to as Wilcoxon signed rank test and
many times simply called signed rank test is the one that takes into account the magnitude of
the observed differences. The statistical procedure is
1.

Rank the differences without regard to sign. ie)

Rank the absolute value ( or

modulus) of the differences. The smaller difference is given rank 1 and ties are
assigned average ranks.
2.

Assign to each rank, the sign of the observed differences

3.

Obtain the sum of -ve ranks and tve ranks

NON - PARAMETRIC TESTS

Denote by T the absolute value of the smallest of the two sums of ranks found in
I

previous step.
To test the hypothesis of no difference between the effect of two treatments compare'
T with table value of T.
I

- 6.

The observed T::; Table

I,

hypothesis is rejected. ie) treatments are alike

otherwise hypothesis is not rejected.


RUNS TEST
Theory of runs is used to test the following two cases.
Case 1 :

The observations have been drawn at random from a single population

Case 2:

Two random samples come from the population having the same

distributionl~

Case I

1.

List the observation in the order in which they were obtained. (i.e) in the order of
occurrences

2.

Determine the sample median. Determine the observations below the median by -ve
sign & observations above the median by +ve sign.

3.

Denote the no. of -ve signs by n1 and the no. of +ve signs by n2

4"1

Count the no. of runs & denote this no by r in terms of our symbols ,a run is the
sequence of signs of the same kind bounded by signs of other kind)

5. '.

If cal r ::; critical value of r in table for chosen significant level. Hypothesis is rejected
otherwise it is not rejected.

Case 2

1.

List n1 + n2 observations from the two samples in order of magnitude. ie) in absolute
order.

2.

Count the no. of runs, denote the observed no. of runs by r

3.

If cal. r ::; Table value of r.

Hypothesis is rejected, otherwise it is not rejected.

37

PARTIAL AND 1lUL7IU CORRELATION

PARTIAL AND MULTIPLE CORRELATION


'&.

When we study the relationsh\p between two variables it is simple correlation.

'&.

When no. of variables is more than two, the relationship between the variables is either
partial or multiple

'&.

In partial correlation, we measure the correlation between a dependent variable and one
particular independent variable, when all other variables involved are kept constant.For
example, in the study of broilers, weight of broilers, feed intake, labour used, medicinal cost
etc. are the variables taken for study. We study the relationship between weight of broilers
and feed intake eliminating the effect of other variables, it is partial correlation.

It we denote the dependent variable by , Y , & independent variable by X1,X2,X3 ...... etc.,
The partial correlation between Xl &X2 keeping

X3 as constant denoted by rXl X2. X3 or r12.3

which is calculated by
r12.3

r12 - r13

r23

J(1 - r132 ) ( 1 - rd)


;.

r'2.4

(1- r142)(1- r242)

The above partial correlation is 'fialled

MIT ~ .,~\~~ as we art

keeping one variable as constant.


When we keep two variables as constants, it is called second order partial
correlation. In general it we keep' n ' variables as constants, it is called

nth order partial

correlation
The second order partial correlation is given in terms of first order partial correlation
coefficients.
or

"2..34

(1 - r13.4 2) ( 1 - r223.4 )

If ryl = ( 0.26 )N.S., then we may cot1ctude that the variable

xda not contributing

'1'"

Significantly to ' y ,

&If ryl(23 ... ) = (0.72 )**, then the contribution of Xl singly to' y , is significant and due
to influence of other variables Xl is not contributing significantly.

PARTIAL AND MULTIPLE CORRELATION

MULTIPLE CORRELATION
When we try to study the relation between two or more variables with one variable( i.e)

t'~(}

one dependent variable, then the correlation is

multiple correlation~ It is denoted by

R y ( x1x2x3 .......) or R y (1234..). It actually measures the combined influence of all the .
irJdependent variables with one dependent variable.

R 1 (23)

R 1 (2,3,4)
R is given by the correlation coefficient between th., .~ .....pf "y' &-.expected
value of y (1)
COefficient of multiple determination
Square of multiple correlation( R2 lis defined as the
determination or Coefficient of Determination
I

Coefficient of multiple

Note: Square of simple correlation coefficient is called a~ simple determination


,

IntE:Jrpretation: If R2y (X1X2X3 .... ) = 0.78

We interpret that 78% o,f variation in y is due to X1X2X3 ..... etc.


I

If R2 is very low or R2 is Non -Significant, it means that we have not taken in our study
the characters which influence significantly on the effect of y
I

COefficient Of Non-Determination

.1
.

It is given by

1 - R2

Or 1 - R2y (X1X2 .... )

It gives us the amount of variation in the dependent variable due to other v~


which are not taken in the study.
ie)

If R2

1\9)

22% at variation in I y I is due to other variables which are not taken in the study.

0.78 ,then 1 - R2

0.22

C()efficient Of Alienation
Square root of (1-R2) is defined as coefficient of alienation.
MULTIPLE REGRESSION
Multiple regression analysis enables to measure the joint eHect of any number of

independent variables upon a dependent variable

I'MfIIU. ANU-.lW'L.E C;ORRELA nON

A multiple regression equation is an equation for estimating a dependent variable say'


y , from the independent variables Xl X2 X3 ..... Xn& is called a regression equation of 'y , on

In functional relationship, we write this as y =f (Xl,Xz,X:t........


(i.e)

Xn)

y is a function of Xl,h ....... Xn


If this function 'f ' is linear, we say that it is multiple linear regression -

A multiple linear regression with' n ' variable is given by


y

ao + a1Xl + a2 X2 +............... + an Xli

where aO,a 1,a2 ................. an are constants

These constants aO,al, .......... an are calculated by , Least square method'


The method of Least square is that the sum of square of the deviations of the observed
value of ' y , from the expected value of y [Y 1is minimum. ie)

L ( Y-

Y)2 is minimum.

By the principle of least square, we get the following normal equations


L y = nao + alLxl + a2 L X2 + a3 L X3 ........................... +.an L xn

(1)

L. X1Y = aD L. Xl + alL. '<1 2 + a2 L. X1 X2+ a3 L X1 X3 +............ +anL X1xn

(2)

3L X2Y = aD L X2 + a1 L X1 X2 + a2 L X2 2 + a3 LX2 X3 +............ +anL X2XII

(3)

L xny = ao L Xn + a1 L X1 Xn + a2 L

X2Xn

+ a3 L X3 Xn + ...........+anI. xn2 _ _----In)

The above equations are known as normal equations. The equations can be solved by
elimination method or by matrix method or using determinants (Cramer's rule)
Matrix Method
In matrix form, the normal equations can be written as

AX=B
.;

......."

Where A=

Ix 1 IX2 .. ~".. . IXn


IX1 IX12 IXl X2 ........ IX1 Xn
IX2 Ix 1X2 LX2 2 ........ LX2 Xn

"'-.,

'

40

PARTIAL AND MULTIPLE CORRELATION

Let 'c' the inverse of the matrix A be of the form


C=

Cll C12 ..... Cln


C2l, C22 ....... C2n

Cnl, Cn2.... Cnn


Then X = CB

X can be obtained from the above matrix equation,then substitute in the multiPle regfession
equation

Y=ao + al Xl + a2 X2 + ...... + an Xn
Calculation of R2 and standard' Error of Estimate (S2)
Total sum of squares (TSS)

Ly2 -

~2

=
IX2 Y = LX2 y - ~:X2 LV
N

LX n Y =LXn Y- LXn LV
N
Sum of squares due to regression (RSS) =al LX1Y + a2

LX2 y+ ..... + an LXn Y

Sum of squares due to residence (ESS) = ISS - RSS

S.E. of estimate (s) =JESS


R2

N-k

where N is the total number of observation k is the no.


of parameters.

RSS
TSS
Significance of R can be seen from table with d.f. = N-2 (or)
!
Using F test. The F statistics bein9 F =
R2j k
with k, (N~-1 )d.f.
(1-R2) j N - k - 1
Significance of partial regression coefficient (ai's)
The "t' statistics is given by
T=
ai
with d.f ~ N - k - 1 where ok' the no. of independent variable N is the
s --Jcii
total no. of observation

41

,AII1ML AND MULTIPLE CORRELA TlON


Interpretation
1.

If R2 is insignificant, it means that the independent variables (xis) studied are enough
to explain the variation in the dependent variable (y) (i.e) if R2 = 0.65, it means that
65% of the variation in the dependent variable is due to the independent variable
studied.

2.

If R2 is not Significant, it means that the independent variable (Xi's) studied are not
enough to explain the variation in y.

3.

If the partial regression coefficient (ai) is Significant, if means that by one unit increase
in xi,we will have an increase of ai units in y.

,
4.

If the partial regression coefficient (ai) is not significant, it means that by increasing one
unit of above xi we are not getting significant increases in y.

For two variables


equations are

y = a + b x which is known as simple linear regression, the normal


Ly=Na+bLx
(1)
LXY = a L x + b L X2
(2)
Solve (1) & (2) , compute a & b which is given by
b = LXY - x y

Relative Importance of different X-variables:


In a multiple regression analysis the question may be asked: Which X variables are
most important in determining Y? If the objective were to predict Y or to "explain" the variation
in Y, the problem would be fairly straightforward if the X-variables were independent. From the
\

model

Y =ao + a1 X1+ a2 X2+ ...... + ak Xk + E


We have in the population

Where o-? denotes the variance of Xi. The quantity a?ai2 measures the fraction of the
variance of Y attributable to its linear regression on Xi. With a random sample from this
population, the quantities.
a? (_LxNLy2) are sample estimates of these fractions. (In small samples a correction
for bias might be advisable since ~i2LX?!Ly2 is not an unbiased estimate of ai2ai2!Ly2)
The square roots of these quantities, ai -lLX?!Ly2, called the standard partial
regression coefficients, have sometimes been used as measures of relative importance, the
X's being ranked in order of the sizes of these coefficients (ignoring signs). The quantity

-lLX?!Ly2 is regarded as a correction for scale. The coefficient estimates aiaJay, the change
in Y as a fraction of cry produced by one S.D change in Xi

42

flA I If ANAL. Y::;I::;

PATH ANALYSIS
It is a technique to study, the direct & indirect effects of independent variables on the
dependent variables.
The direct effects are the contribution of independent variables on dependent variable
on its own & indirect effects are contributions of an independent variable in association with
some other independent variable.
In general if we have' "n' independent variables as Xl. X2 ,X3 .......... Xn and the
dependent variable as Y and if the direct effects are denoted by P11 , P22. P33 ......... Pnn, then we
have the following 'n" equations.
Pll rl1 + P22 r22 + P33 r13+ ................ + Pnn rln =

rly

Pll r21 + P22 r22 + P33 r23+ ................+ Pnn r2n =

r2y

Pll rnl + P22 rn2 + P33 rn3+ ................ + Pnn rnn =

rny

Solving the above equations we get Pl1 ,P22.. .. ... Pnn .


Let us denote Pij as the indirect effect of jih variable via jth variable
Then

Pij

Pjj rij

Indirect effect of 1sl variable via second variable (P12) = P22 r12
Indirect of 1st variable via 3rd variable(P13) = P33 r13

1 - ( P1l rly + P22 r2y + P33 r3y+ ..............+ Pnn rny)


which is the effect due to residual factors (i.e) factors that are not studied.
Residual effect

Path analysis is a technique that uses the linear regression models to test specific
theories of casual relationship among a set of variables. Path analysis involves looking not only
for relationship among variables, but also for casual relationships. Two variables that are both
casually dependent on a third one will themselves are associated.

;.

::.'~:.
1~..
iL":>~ I .:.......~,. .
:!

_______

(, ','

' "-

. , . ,"

i .. >::'.:'

1/.'_..._ _

_ _ _
. _.. _ . I

' .. ,II\~

oo

oo;rol)::II'

....... :
\

. ::..

;_:_:-~ .; ",,11:'1 :'"

\ In.\
:" I'....

---

... 4

.. 1.:

--_._'

\
:11

I.'

II',

.lIh.llt. I

.. n~ ~ ~"a03".H' .. h .. n

. \.1. .. ' ,,. I

Jh.c \)1

~.

/.!

,Iw ",""" ,,,,,,.11


" u""rQII",

"ATH ANAL YS/S

Fig: X and Yare casually dependent on Z, so the association between X and Y dis
appear when Z is controlled. As shown in fig, if the association between two variables
disappears under control, one can conclude that there is not a casual relationship between
them. If the association does not disappear, though, we cannot necessarily conclude that the
relationship is casual, since the relationship could disappear when other variables (perhaps
unknown to us) are controlled. Thus we can prove noncasuality but we can never prove
casuality.

Path Diagram:
In developing theoretical explanations of cause-effect relationships, we might
hypothesize a system of relationships in which some variables believed to be caused by other
may intern have effects on other variable. Thus, a single multiple regression model may be
insufficient for that system, since it can handle only one dependent variable. Path analysis
utilizes the member of regression models necessary to include all proposed relationships in the
theoretical explanation.
For example, suppose that our theory specified that one's educational attainment
depends on several factor, in particular upon one's parent's income level, one's intelligence
and one's motivation to achieve we might hypothesize, in addition that one's motivation to
achieve depends on several other (prior) factors-among them, the parent's educational level
and the student's general intelligence level; that the income of the parent depends in part on
the parent's educational level; and that educational attainment may also depend directly on the
student's intelligence.
I

Fig. shows a graphic summary of the theory just outlined, in the form of "a path diagram"

Fl"t'lI:i

"_,.",p\~ of

prt\lf!l\I)'f'

ploth <h.t,rutl

Fig: Example of preliminary path diagram.


Ordinarily in a path diagram, each arrow (including the curved line) would have a
number written over it. These numbers, called Path Coefficients, are simply standardized
regression coefficients for the regression equation for the dependent variable to which the
arrows point. In the fig. there are three sets of coefficients that must be estimated, since there
are three different dependent variables. Viz. 1). Child's educational attainment. 2) Child's
achievement motivation. 3). Parents income. The path coefficients show both the relative

44

PATH ANAL YSIS

,trength of association between variables, controlling for other variables in the sequence, and
he sign of influence. Their interpretation is simply that of multiple regression b* coefficients. A
me standard deviation change in the independent variable corresponds to a b* standard
jeviation change in the dependent variable, controlling for the other independent variable in
hat particular regression equation.
An unmeasured residual variable path is usually attached to each dependent variable
n the path diagram to account for the variation unexplained by its independent variables. Each
'esidual variable represents the remaining portion (1 - R2) of the unexplained variation in its
~orresponding

dependent variable; Where R2 is the coefficient of multiple determination for the

regression equation with that dependent variable. Their path coefficient equals ~1- R2. Every
dependent variable will have a residual path associated with it. It is assumed that the residual
factors are uncorrelated with the other independent variable in the system, and with other
residual associated with other dependent variable in the system.
Most path models will have variables that are dependent on some other variables but
are, in turn, causes of other dependent variables. These variables are sometimes labeled
intervening variables since they occur in sequence between other variables. Thus in the
example, the child's achievement motivation intervenes between child's educational
achievement and child's intelligence. This means that, if the theory is correct, the child's
intelligence affects his or her educational achievement in parts its effect on achievement
motivation. Thus, it 9ffect in this sense is indirect. However, the model also proposes that the
child's intelligence have a direct effect on his or her educational achievement over and above
the effect through achievement motivation. By performing the regression analysis, we can test
whether it is true. For example, if intelligence affects educational attainment only through its
effect on motivation, then the direct path (controlling the motivation) will have a nonsignificant
path coefficient. However, if intelligence works both directly and indirectly, then all three
coefficients of parts of paths leading from intelligence to educational attainment should be
significant. If we do find a nonsignificant path, then we can erase that path from the diagram
and perform the appropriate analysis again to reestimate the coefficients of the remaining path.
In fig, it has been shown path coefficients, which is the form seen in research literature.
The residual variables for the three dependent variables are denoted by R1, R2 and R3' If 28%
of the child's educational attainment were explained by its three predictors, for example, then
the path coefficient of the residual variable R1 for the child's educational attainment would be
v'1-R2 = v'1-0.28 = 0.85. It seems from fig. that of the three direct predictors, the achievement
motivation of the child had the strongest partial effect on his or her educational attainment
(Controlling for the child's intelligence and parents income). The child's intelligence has a
moderate indirect effect, through increasing achievement motivation, as well as a direct effect
on educational attainment. The parent's income is not as important as the child's achievement

45

PATH ANAL YSIS

motivation or intelligence in determining the child's educational attainment, but the parent's
educational level has an important effect on the child's achievement motivation. Of course such
conclusion would have to be weakened or modified if there were substantial sampling error for
the path coefficients.
In summary, the basic steps in ,a path analysis are as follows:
1. Set up a preliminary theory to be tested, drawing the path diagram without the path
coefficients.
2. Do the necessary regression modeling to estimate the path coefficient and the residual
coefficients.
Evaluate the model perhaps erasing nonsignificant paths and recalculating path
coefficient for the new model.

48

DESIGN OF EXPERIMENT

DESIGN OF EXPERIMENT
Design of experiment means planning an experiment. It may be defined as the logical
construction of experiment in which the degree of uncertainly with which the inference is drawn
may be well defined.
The subject matter of the design of experiment are
is..

Planning of experiment

is..

Obtaining relevant information from it regarding the statistical hypothesis under study.

is..

Making a statistical analysis of data

Definition of terms used


Experiment
It is a device or means of getting an answer to the problem under consideration.
Treatment
Various objects of comparison in a comparative experiment are termed as treatrTlllllll
Eg 1.
2.

In a nutrition experiments different diets


In an Agricultural experiments different fertilizers.

Experimental unit
The smallest division of experimental material to which the treatment is applied, and
make observation on the variable under study is termed as experimental unit.
Eg:-In a nutrition experiment, a group of pigs in a pen
Blocks
In an agricultural experiments, we divide whole experiment unit into relatively
homogeneous subgroups or strata. This strata which are more uniform among themselves than
the field as a whole are known as blocks. In animal husbandry experiments breed can be taken
as blocks.
Yield or response
The measurement of variable under study on different experimental units are terrnElii
as yield (ie) it is the outcome of experiments.

'!

Experimental error
It is the unit to unit variation within the same treatment group and this is the measure 01
variation due to uncontrollable Dr unassignable causes.

DdION OF EXPERIMENT

It describes: the failure 01 two idemiQIIIy treated experimental _


,

~,,,, ,,:' '0

to yield identical

'

results.

Purpose of experiment at design


It is to provide maximum amount of information relevant to the problem under
investigation.

Basic principles of experimental design


There are three basic principles of experiment Qe$~n
They are 1.

Replication

2.

Randomizatior

3.

Local control

Replication
It refers to the number of repetition of treatments. It means execution ot treatments
more than once. In other words, the repetition of treatments under investigation is known as
replication
An experimentor resorts to replication in order to average the influence of the chance
factor on different experimental units. Thus the repetition of treatments results in more reliable
estimate than is possible with a single observation.

Advantages:
&

It serves to reduce the experimental error & thus enables us to obtain more precise ,.'
estimate of the treatment effects. We know that standard error of the mean of the sample
of size' n ' is
S.D.

s.E=f
The precision of a design is given by j_ As' n 'increases. S.~,(fecreases and hen6e
precision increases.
(:s.

S. E

/."

: '.'

The most important purpose of replications is to provide an estimate ot the experimental


error without which we cannot test the significance of difference between any two
treatments and to determine the length of confidence intervals.

(3.

The estimate of experimental error is obtained by conSidering the difference in the


experimental units receiving the same treatment in different replication and there is no
other alternative of obtaining this estimate.

DESIGN OF EXPERIMENT

Note : Adequate number of replication for various treatments in an experiment


depends upon the knowledge of the variability of the experimental method. A general rule is to
get as many replication which will provide at least 12 degrees of freedom for error.
Beyond 12 d.f. ' F ' value do not decrease rapidly; it is always better to use at least four
replications.
Randomization

It is the allocation of treatment to experimental units so that each treatment gets an


equal chance of being selected. (i.e) it is the process of assigning the treatments to various
experimental units in a purely chance manner.
Advantages:

It eliminates personal bias

. 1 It eliminates unanticipated influences

Local control

If the experimental materials is heterogeneous and dif1erent treatments are allocated to


various units at random over the entire experimental material, the heterogeneity of
experimental units wiil also influence the uncontrolled factors and thus increase the
experimental error.
It is desirable to reduce the experimental error as far as possible without unduly
increasing the number of replication (or) without interfering with the statistical requirements of
randomness so that even smaller differences between treatments can be detected as
significant. In addition to the principles of replication & randomization, the experimental error
can further be reduced by grouping them into homogeneous groups. The process of reducing
the experimental error by dividing the relatively heterogeneous experimental material into
, homogenous blocks such that within the blocks there is uniformity as far as possible and
between the blocks there is variation, is known as local control.
Advantages

(S_

It increases efficiency of design by reduCing the experimental error.

In completely Randomized design ( CRD ), there is no local control

In Randomized block design ( RBD ) there is local control in one direction with one criterion

In Latin square Design ( LSD), there is local control with two criteria in two directions.

Besides these three principles we have two more principles in any experimental design
which are:
1.

Auxiliary Var'lable

2.

Control

DESIGN OF EXPERIMENT

Auxiliary Variable:

In any experiment there may be some initial variables which may influence the
response of our experiment. For example, in weight gain studies, initial weight will be the
auxiliary variable. We have to choose all the auxiliary variable and record their values before
applying the treatments.
Control:

applied.

For effective comparison, it is better to have control groups in which no treatment is


, , ; .'
'.n
I;

,-".

-'.,

'

Purpose of auxiliary variable and Control is to increase the precision of the


experimental design.
COMPLETELY RANDOMIZED DESIGN (CRD)

, .,:

It is the simplest of all designs based on principles of randomization and replication, In


this design, treatments are allotted at random to the experimental units over the entire
experimental material. Let us suppose that we have 't' treatments, the ith treatment replicated
'ni' times i =1 ,2 ..... t. Then the whole experimental material is divided into N= I ni experimental
units and the treatments are distributed completely at random, Randomization assures that
extraneous factors do not influence the treatment. In particular if ni = n for all treatments (ie) all
the treatments are replicated equally, and the randomization gives every group of In' units have
equal chance of receiving treatments:
Advantages :
(3_

In CRD, we can use all the experimental units available

(3_

(3_

'~I..

I' :

The design is very flexible since we can use any number of t(ea~." any n~i~
replication without complicating the statistical analysis,
" \.
'
The statistical analysis remains simple if some or all the observations for any treatments
are lost, we merely carryout the statistical analysis with the available data. Moreover the
loses of information due to missing data is smaller in comparison with any other design.

Disadvantages:
(3_

If experimental units are not homogeneous, the error variance will be more which makes
the design less efficient and results in less sensitivity in detecting significant effects.

Applications:
/

1.

It is most useful when the experimental units are homogeneous

2.

It is used in situations wherein some of the experimental units is likely to be destroyed


or fail to respond during the period of experimentation.

DESIGN OF EXPERIMENT

~tistical

Analysis of CRD

Linear Model being

= fJ

Yii

+ ti +ej

Where Yij - Response value of ' i ' th unit receiving 'i 'th treatment J.l - General mean

=effect due to 'i' th treatment

effect ti

eij = Error effect due to chance such that eij are identically and independently distributed (Li.d)
normally with mean =0 and variance cri 2which is written as [ i.i.d with N ( 0, cr i2 ) J
Let us consider the case of C.R.D. with 't' treatments which is tabulated as follows
Tr1

Tr2

Tri

Trt

Y11

Y12

Y1i

Y1t

Y21

Y22

Y2i

Y2t

Y31

Y32

Y3t

Y3i

:,.j1

:~/,
/

Y n11

Yn22

Ynii

T1

T2

Ti

Yntt
Tt

:::;

.,; I

Grand total of response values of aUhe ' N experimental units

Let us take ~ Yij :::;

:::;

:::; Sum of treatment totals

Let T1,T2 ...... Ti ......... Tt be the treatment totals


Ti :::; ~ Yii = Sum of response values of all the experimental units in ith treatments.

i
Total sum of square [ T.S.S.]

= ~ ~ ( Yii - G )2
i

::: L ~ [ ( Yii - Ti )+ (Ti - G )]2


i j

:::;

( Yii - Ti)2 +L L (Ti - G )2+2 L L ( Yii - Ti ) (Ti-(


i j

i j

L L ( Yij - Ti)2 + L 1: (Ti - G )2 +21:(Ti-G) L ( Yij - Ti )

:::;

i j
:::

:::;
'

..

i i i

Sum of square + Sum of square


+ 2(LTi-1:G) L ( Yii - Ti )
due to Error
due to treatment
i
i
ESS
+ TrSS
+ 0 as (LTi-LG):::O

Stepwise procedure of CRD:

Step1 : Ho. There is no significant difference between treatments


Step2 :

Find the treatment total T1,T2 ......... Tt

Step3
Find Grand total G :::; T1 + T2 +......... + Tt

DESIGN OF EXPERIMENT

Find the correction factor ( C.F.) or correction term ( C.T.)

G2
C.F =
2)

N
Calculation of sum of squares.

i)

Total sum of square (T.S.S)

= Sum of square of the response values of all

experimental units - Correction factor


T.S.S. = L L Yij2 - C.F.
I

ii)

Treatment Sum of squares(Tr.S.S)

T22
+ n2

n,

Ti2
T12
+ ........... + nj + ............ t nt

Error sum of square (E.S.S.)

iii)

C.F.

T.S.S - Tr.S.S.

Step 4 ,
FormatiOn of ANalysis Of VAriance table (ANOVA or AOV table)
ANOVA

Degrees of
freedom

Sources of
variation

( d.t )
t -1

Between
Treatments

Sum of square
S.S
Tr.S.S

I'

Within
Treatment
(Error)

N- t

E.S.S.

Total

N -1

Error d.l

=
=

..

Mean
SS
Square= d.f
Tr.S.S
Tr.M.S= - t-1
E.S.S.
EMS=
N-t

,....

Tr.M.S

F=

-EMS

T.S.S.

Total d.l - treatment d.l


(N -1 ) - ( t - 1 )=N-t

Step 5:
Interpretation

Case 1. If CalF <tab F for d.f.= ( t - 1 ),( N - t ) at 5% level. F is not significant.Ho is


accepted.F= ( ) N.S. All the treatments do not differ significantly.
Case2. If cal F> tab F for d.f. =( t - 1 ), ( N -t ) at 5% level F is significant F= ( )*. Ho is
rpipr.tprl

52

DESIGN OF EXPERIMENT

Case3. If cal F> Tao F for d.f.= ( t -1 ),( N -t ) at 1% level. F is highly significant.F=( )** Ho is
rejected.
In case 2& 3, When we are rejecting Ho we have to test the significance of different treatments
among themselves, we have to work out Critical differences (C.D) between any two treatments.

Critical differences (C.D)


It is the least significant difference between any two treatments above which the
treatments will be declared as significant

CD. between any two


treatments at 5%(1%)levelf =

SE of difference between x
any two treatments.

CD between Trj &Trj at 5% (1%) level

jEMS

\}

rlnl

tab 't' for error d.f


5% (1%) level

~+ ~ x
nJ

at

tab ' t ' for error d.f


at 5% (1%) level

where ni & nj are replications of Tri & Trj respectively.

Calculate the Treatment means

1
nj

T1
,Tr1 = n1,

ie)

Tr2

., ............. Tn =

.......... Tri =

= ,

Tt
nt

and find the difference between mean of the treatments and if it is greater than C.D. declare
them as significant and if it is less than CD declare them as non significant.
Bar chart representation
1'...

Write treatment means in ascending order &draw bar above the treatments, Which dQnot
.

differ significantly.
Note: In CRD, if replications are equal ie n1 = n2 = ......... = nt=n(say)

Tr.S.S. =

Tr.S.S. =

T12 + T22 +........Tt2


n

- C.F.

C.F
,

-'

CD between any two treatments at 5% 0,-1% level

J2E~S

x Tab t for error d.f at 5% 0-: ~% level


/

:;:)

.t.j

DESIGN OF EXPERIMENT

RANDOMIZED BLOCK DESIGN(RBD)


If whole of the experimental material is not homogeneous, then a simple method of

controlling a variability of the experimental material consists in stratifying/grouping the whole


area into relatively homogenous stratas or subgroups or blocks or replicates. If the treatments
are applied at random to relatively homogenous units within each strata or block and replicated
over all the blocks then the design is the Randomized Block Design
The blocking has to be done on the basis of any observable character which_is likely to
have influence over the charact8r under study.
Advantages of RBD over CRD
~

In randomized block desic.n, treatments are allocated at random within the units for each
block. Also variation among blocks is removed from variation due to error. Hence one
source of variation is contrulled by stratification and experimenter prefers randomized block
design than completely rarjomized design.

Disadvantage
(3.

When data from some individual units are missing, we have to use missing plot technique
and estimate the missing values and then carryout analysis of variance (AN OVA ).If the
missing observations arb more, then this design is less convenient than completely
randomized design.

Disadvantage

a In each block we must have experimental unit equal to the number of treatments or
multiple of the treatments. If we have It' treatments and' b 'blocks, total experimental units
needed are.
N =

bt such that in each block, we must have


, t ' experimental units so as to apply all the 't treatments

~ The efficiency of the design decreases as the number of treatments and ~e block size

c.

increases.

es.. Statistical analysis in RBD

"

The model being


=

l-l + ti + bj + eij

Where Yij - response values of experimental units for illl treatment and in ph block.
General mean effect

DESIGN OF EXPERIMENT

ti

'i' th treatment effect

bj

'j' the block effect

ejj

Error effect due to random component, eij's are LLd N ( o,crj2 )

Step1 : Ho :There is no significant difference among the treatments


Step2 : If the response values are given as follows
Tr2

Trl
Y11
Y21

Tr3 ......

.. ....... Trt

Tn.

........Y11
........ Y21

Yii
Y2i

Y12
Y22

Total
B1
B2

... ................

...................

.. ...... Yit --

Yij

Yj2

Yj,

...........

','

Bib

Ybi

Yb2.

Yb1

Bj

-"

!l)4~l

........ Yb\

Bb

........TI

G.T.

Total
i)
ii)
iii)

........TI

T2

T1

.-"'"

Calculation of treatment total


T1 ,12 ...... Tj .......... Tt
Calculation of block total
B1,B2 ..... Bj ........... Bb
Calculation of Grand total (G.T)=Sum of response values of all the' N 'unHs.
(Or)
Sum of all the treatment totals
(Or)
Sum of all block totals

/ G.T = L L Yij (or) Tl+T2+.....+Ti+.....+Tt (Or) B1 + B2 ..... +Bj+ ... +Bb


ii

iv)

Correct Factor (C.F)

= G2
b xt

Step3
Calculation of Sum of square
i) Total sum of squares(T.S.S) = Sum of square of response values of all 'N' experimental
Units - C.F
= LLYij2-C.F /'
: 'I

.Ii) Treatment sum of . .

= T12. + T22 + .. ,...........+1!


b

(Tr.S.S)

iii) Block Sum Of SQ~S)

-C.F'

= B12 + B22 + .......... $jilt


Bb2 - C.F
:
t

iv) Error sum of square( c.S.S).

T.S.S. - ( Tr.S.S. + 8.S.S. )

55

DESIGN OF EXPERIMENT

Step 4
Formation of ANOVA Table
Source of
variation
Between
treatments

Degree of
freedom (d.f)
t-1

Between Blocks

b -1

B.S.S

BSS
BMS = b - 1

(t-1 )(b-1)

E.SS

ESS
EMS::: (t-1)(b -1)

bt -1

TSS

Sum ofSquare
(S.S)
Tr.S.S

Mean Square(M.S )

Trss
t-1

TrMS
EMS

TrMS :::

Error
Total

=
=
=

Error d.f

BMS
EMS

Total d.f -(Treat.d.f + Block d.f)


bt - 1 -( t -1 + b -1 )
(t -1) (b -1 )

Intetpretation
I)

'.' For Treatments

1. ..

Cal F < Tab F at 5% level d.f = ( t - 1 ), ( b -1 ) ( t - 1)

i)

ii)

F' is not significant F = ( ) N.s.Ho is accepted.

. CaIF>TabFat5%d.f=(t-1),(t-1 )(b-1)
F is significant F = ( )* Ho is rejected.

iii)

Cal F > Tab Fat 1% d.t= (t-1), (t-1), (b-1) F is highly significant F = ( )** Ho ..
is rejected
i

If P iJ significant or highly significant, we have to work out critical difference.


I

Critical difference
between any two treatments at

~
~
b

5% or 1% level

,_ Tab t for error d.f.

1% level

at"' or,

Workout the treatment means and if the difference between and two treatments means

3
.,.;

is less than the critical difference, they do not differ significantly &if it is greater than
critical difference C.D., they differ significantly.

II)

Interpretation for Blocks:

i)

Cal F

< Tab F for d.t =( b-1), (t -1) (b -1) at 5%level

"F' is not significant, F = ( ) N.S Ho is accepted.


ii)

Cal F >Tab F for d.t =( b-1),(t-1),(b-1) at 5% F is signiticant,.jfil.':"'~


I

. 'i6

DESIGN OF EXPERIMENT

Cal F> Tab Fat d.f ( b-1), (t-1), ( b-1) at 1% ' F ' is highly significant, F =( )"Ho is
rejected

iii)

If it is significant or highly significant, we have to workout critical difference between

any two blocks

cb.

x Tab t for error d.f at 5% or 1% level


1 .between any blocks at 5% or 1% level ~2EMS
t
We have to work out the block means. If the difference between block means is less

than critical difference then the blocks do not differ significantly & if it is greater than critical
difference, then the blocks differ significantly.
Note:
"&

If there is no significant difference between the blocks, we have not gained any thing

by

using RBD. We have only lost the block d.t =b -1 in the errord.t
"&

If there is significant difference between blocks & if there is no significant difference


between treatments then we should not conclude that the treatments are alike for the block
differences might have influence for the non significance of the treatments.

If there is significance for both blocks and treatments ( or) Non-significance for both blocks
and treatments. Then the interpretation is valid one.

Missing plot technique


If the value of one observation is missing, we have to estimate that value &then do \hi
analysis.

1. ree: tm~nt

The Missing Value ('X')


is given by
tT+bB-G
X=
Blocks
X
B
(t-1) (b-1)
Where
t - no. of treatments
b - no. of blocks
T
G.T
B
Sum of the items in the
Block with a missing value
T - Sum of items in the
treatments with a missing
value
G - Gran':l total
We subtract from Treatment Sum of Square. (Tr.SS ) the value (B-(t-1)X)2
t ( t _1)2

We have to subtract 1 d.f from Total and subsequently Error d.f will be reduced by 1

57

Standard error of difference


between a treatment with
a Missing value and
another treatment is given by

Critical
}
Difference

VI

EMS

}
=

J t;EMS

_OF EXPERIMENT
;e '

+
b(t-1)(b-1)

[2- :

Tab t for error til.

+
b(t-1)(b-l)}

If two values are missing. Say' X ' & ' Y' we have to guess a suitable value TOr one 01
them say 'X' & then we have to calculate 'Y' then taking that value for 'Y' we have to calculate

'X' & repeat the process, till we get two closer values for anyone of the unknown values of
'X','Y'.
Here we have to subtract
(B - ( t - 1) X )2
t ( t - 1 )2

b\ two d.f will be subtracted from total

m- (t -1) y)2

.. frOm Tr.8.S .
, :~. r'

t ( t - 1)2

'"

&hence also from error

Relative Efficiency of RBD over CRD :

R.E. of RBD Over CRD =

( b-1 ) BMS + b ( t-1) EMS


(bt-1 ) EMS

If the blocking has been effective in increasing the JQCIlt...\~ ';"~, the
relative efficiency will be greater than 1. The quantity
'.'
..:'
.
(R.E -1)

measures the increase in the precision due to blocking :

LATIN SQUARE DESIGN

When the available experimental material are known to be subjected to two major
sources of variation, the experimental units are grouped according to these two sources of
variation, so as to have a two way elimination of variability in the experimental units. The
double groupings (double blockings) are called rows & columns .In each row & each column
every treatment is applied once. This leads to an arrangement of "t' treatments in a square of "t'
rows & 't' columns such that every treatment is allotted once in every row & once in every
column. Such a design is called Latin Square design (LSD). The number of experimental units
for a LSD of "t' treatments is txt (i.e) t2
With 5 treatments the number of experimental units needed is 25. With six treatments
the number of experimental units needed is 36.& so on. When more experimental units are

58

DESIGN OF EXPERIMENT

required for the experiment, then the experimental units are likely to be heterogeneous & also
the allotment of treatment is complicated. For treatments of less than 5 the error d.f will be less
(it is advisable to have error d.f. minimum 12 for a valid conclusion. In general LSD is adopted
for treatments from 5 to 8

Statistical Analysis
The response values

'Yijk' corresponding to 'i' lh treatment in '}'lh row & 'k' lh column will be of the form
Yijk

11 + ti + rj + Ck + eijk

Where. 11- General mean effect ti,rj,ck are the effects due to treatment, row and column

& eijk - is the error effect which is identically, independently distributed normally with constant
variance Cii i.e LLd -N(O,Cie2 )
Randomisation in LSD
The allotment of different treatments to the units without repetition of any
treatment in any row or any column should be done as follows.
~

Get a random Latin square of required size '1' x 't' where 't' no. of treatments from table XV
of the statistical tables for biological, Agricultural &medical research by' Fisher &Yates'

&

The choice of random Latin square is desired by the random number which is less than the
total no. of available square in the table

&

Number the columns 1,2 ..... , 't' and get a random arrangement of the numbers 1,
2, ......... ,t. Accordingly change the columns

&

Keeping the first row fixed, number the remaining rows, 1,2,3 ............ (t -1), then
accordingly change the rows .

./

The latin square so obtained is a latin square in which the treatments are allotted
randomly to the experimental units. This treatments are applied on the experimental units
which have been grouped according to two way variation among the units.

Stepwise Procedure in a txt LSD


Step 1 : Ho : The treatment means do not differ significantly.
Step 2 : Calculation Of
1.

Treatment totals T, ,T2, ............... ,Tt

2.

Row totals R"R2,R3 ............. ,Rt

3.

Column totals

4.

Grand total(G) = sum of row totals (or) Sum of all column totals(or) Sum of all the

C',C2 ......................... ,Ct

treatment totals ( or) sum of all the response values of experimental units.

59

, . . " OF EXPERIMENT

5.

Calculation of correction factor


C.F= ...Q
txt

Step 3 : Calculation of all sum of square


1.

Total sum of square ( T.S.S.)= ( Sum at square of all txt respo. . .~ ) - ;


correction factor
". , i" .
., ~-. ~ . , 1 .

TSS

'i!

LLL

.:-:

y2ijk

C.F.\-

ij k

2.

Treatment sum of square (Tr.S.S)

= T12 + T22 +........... +TF

~1'.V:'. '

t
3.

Row sum of square ( R.S.S)


= R12 + R22 +............ U'Rt2

5. ..

"

;(.~

4.

- C.F.

Column sum of square (C.S.S.) = C12,,\; C22+.......... !tt'!!~


...

Error sum of square (E.S.S.)

,'-

..

;<.i

C.f.'/

. i "

. ,: .

E.S.S. = T.S.S. - ( Tr.S.S + R.~,S.~ C.S.Sf


Step4
Formation of ANOVA
Sources of
variation
Treatments

Degrees of
freedom (d.f)
t-1
!

I.

Rows
Columns

Sum of
square(S.S)
TrSS
;~

:y),~: ~

t -1

R.S.S

t -1

C.S.S

(t-1 )(t-2)

E.S.S.

12-1

TSS

:::;.\\

Error

it

Total
Error d.f

Mean
square(M.S)
TrSS
t-1
RSS
t-1
..
C.SS
T -1
ESS
. ~-1l(t-2J

F
TrMS
EMS
RMS
EMS
CMS
ESS

.!
i

I'"

I. '

":1 I .,.

Total d.t- (treat d.t + row d.f + column d.f) = t2-1- (H) +(H) + (H)

(t-1) (t-2)

Step 5
Interpretation:

1.

If Cal F< Tab f ( t-1 ),(t-1) (t-2) at 5% level F is not significant. F = ( ) N.S Ho is accepbld

2.

If Cal F> tab F d.f ( t -1), (t-1) (t-2) at 5% Fis significant F = ( )* Ho is rejected

3.

If Cal F> Tab F d.f = ( t-1), (t-1) (t-2) at 1% level F is highly significant F = ( )**'Iio II
rejected

60

DESIGN OF EXPERIIIENT

In the last two cases we have to work out critical difference between any two treatment
(Rows or Columns) at 5% and 1% level which is given by

2EMS
t

T8.b t for error d.f at 5% or 1% level

Then we have to work out treatment means "row means and column means and if the
difference between any two treatments ( rows & columns) means is less than C.D. then they
do not differ significantly. If it is greater than C.D. it differs significantly.
Missing Plot Technique in LSD
Missing value(X)

t ( R + C + T ) - 2 G, where R-row total


(t-1) ( t-2)

C-column total
T-treatment tota

in which one
value is missing

G-Grand total
t - number of treatments
i

[G - R-c - (t - 1) T J2

TreatMent Sum of square will be reduced by

( t-1 )3 ( t-2 )2

Total d.f will be reduced by one &hence error d.f will be reduced by one.
Standard error ( S.E.) of difference between any treatment with a missing value & another
treatment is given by
2 EMS

PItt

1
(t-1)(t-2)

Advantage of LSD

..

With two- way blocking or grouping LSD controls more of variation than CRD or RBD

LSD is an incomplete three way layout. Its advantage over complete three way layout is
that instead of t3 experimental units, we are using only t2 experimental units

I
)-

The statistical analysis can be carried out easily even though it is slightly complicated than
RBD
~

Even with missing values also, using missing plot technique we can do the analysis.

More than one factor can be investigated simultaneously & with fewer trials than more
complicated designs

\ Disadvantage of LSD
~ The fundamental assumption that there is no interaction between different factors may not
be true in general
~

Unlike RBD, in LSD, the number of treatments is restricted to the number of replications &
this limits its field of application. It is suitable for number of treatments between 5 and 8
and for more than 10, the design is seldom used since in that case the square becomes
too large, and does not remain homogeneous.

In case of missing values, when several units are missina. the statistical analvsis will be
more complicated_

61,

FACTORIAL EXPERIMENT

FACTORIAL EXPERIMENT
Factorial Experiment is only an experiment and not a design. It has to be carried out by
following anyone design CRD, RBD and LSD. Most frequently used design for factorial
experiment is RBD. So far we were considering single factors as treatments. In factorial
experiments treatments consist of combination of two or more factors each at two or more
levels. The combination of treatments is such that each level of every factor occurs together
with each level of every other factor. The number of treatments is the product of the number of
levels of all factors. If we have two factors each at two levels we say that it is a 2 x 2 factorial
experiment and if we have three factors each at two levels. Then we have 2 x 2 x 2 (i.e) 23
factorial experiment. If all the factors have equal levels then it is a symmetrical factorial
experiment. In general, if there are n factors each at m levels, then the experiment is mn
factorial experiment. If we have unequal levels, 'a' levels in factor A, 'b' levels in factor B, 'c'
levels in factor C, for the factors A,B,C and so on.

In this case, the experiment is a

asymmetrical factorial experiment and will be written as a x b xc .... Factorial experiment.


Definition
Main effect of a factor is definej as difference between the mean yields for differen~
"

levels of that factor averaging over all levels of all other factors.
Simple effect of a factor at a particular level of other factors is the difference between '

the mean yield of the factor in the particular level and in the absence of that particular level.
Interaction between two factors is the variation of difference between mean yields

for different levels of one factor over different levels of other factor. It is the failure of the
differences in response to changes in levels of one factor to be the same at all levels of the
other factor.
Example:

22 factorial experiment, the power denotes number of factors and base denotes
number of levels of each factor.

Two factors are A and B each of two levels; ao, a1 and bo, b1 the treatment
combinations are four viz. aobo, a1bo, aOb1 &a1b1. Let us suppose that aobo, a1bo. aOb1 and a1b1.
Also denote the mean yields (of r replications say) of the respective treatment combinations
from a factorial experiment.
Simple effect:

The response to the factor A at the level of bo of B is (a1bo - aobo). The response Of A
at the level of b1 of B is (a1b1 - aOb1). These two are called simple effects of A
Main effect:

The average response to A averaged for both the levels of B is

82

FACTORIAL EXPERIMENT

(a 1bo-aob())+(a 1b1-aob11

2
Similarly the average response to B averaged for both the levels of A is
(aob1 - aobo) + (a1b1 - a1bo) 12
Interaction effect:
Apart from the average response of A, we would like to know the differential response
to A at different levels of B if it exists. The measure of this is given by the difference in the
response to A at the levels b1 and bo of B.
i.e.

(a1b1 - aOb1) - (a1bo - aobo)


.

(a1b1 + aobo) - (a1bo + aob1)


;"

: It will be seen that this is also the measure of the differential response to B at the
different levels of A. This is termed the interaction between factors A and B and symbolized
AB.
Sum of squares for Main and Interaction Effects:
Between the four treatment combinations there are 3 d.f. These have been partitioned
into 3 meaningful single d.f.
Main effect A:- SS
B: SS
Interaction AB: SS

=
=
=

r/4 [(a 1bo+a 1b1 )-(aobo+aob1)]2

with d.t: =1

r/4 [(aob1+a1b1)-(aobo+a1bo)]2

with d.f =1

r/4[ (a 1b1 +aobo)-(a 1bo+aob1)]2

with d.f =1

,. Advantages of Factorial Experiment

When the factors are independent there are two advantages:


(:S.

All of the simple effects are equal to the main effect. Hence main effects are all that are
needed to describe the action of a factor.

(:S.

Hidden replication: Each main effect is estimated with the same precision as if the entire
trial had been devoted to that factor alone. It also provides a systematic set of factor

- combination for estimating the interactions each with equal precision.


Disadvantages of Factorial Experiment
(3_

As the number of factor increases the size of the experiment becomes very large with 8
factors each at two levels there are 256 combinations in the factorial experiment. Not only
experiments with too many treatments are costly to run, but it is also difficult to find
sufficient uniform materials to form blocks to accommodate all the treatment combinations.

a. Large factorials may be difficult to interpret particularly when interactions are present.

FACTORIAL EXPERIMENT

Uses
1.

In experiments where the aim is to examine large number of factors and to determine
which are important and which are not important.

2.

To study relationship among several factors in particular to determine the presence of


interaction and its magnitude.

3.

In experiments designed to lead to recommendations over a wide variety of conditions.


Some of the conditions can be included as factors in the trial even though they are not
the principal factors of interest.

Two Factor Experiment


The simplest kind of factorial experiment is one with two factors. For example, a food
processor might be interested in the effects of storage temperature of the length of storage on
the quality of frozen strawberries.
Storage Temperature

t1 = - 1OoC,
S1

= 1 month,

t 2 = -200 C
S2 = 2 month

Factorial set of treatment in this case will be 4 (i.e.) t1S1, t1S2, t2S1, t2S2. The experiment
could be run with an experimental design chosen to fit the conditions. Most often, we use RBD,
we need blocks with 4 experimental units to conduct experiment. The data will be analyzed in
such a way that main effects of temperature and time and time x temperature interaction will be
estimated and tested.
In practice, the factors at different levels will not behave uniformly. When two factors
interact, the response to changes in one factor is conditioned by the level of other factor.
Data Analysis
To illustrate the process in general, suppose we have two factors say Factor A at 'a'
levels and Factor B at 'b' levels and the experiment is done using a RBD with 'r' blocks, each
containing "ab' units. The model for observation in this experiment is given by
=

where

~k

Yijk -

is the response value of jth level of factor A, kth level of factor B in the ith block.
over all mean yield
Effect of ith block which is distributed normally with mean 0 and variance (Ji 2.
(pi ~N (0, (Ji 2))
added effect of jth level of factor A measured as deviation from Il
(Lj (Xj = 0)'
added effect of the kth level of factor '8' measured as a deviation from Il
Lk ~k =0
added effect of the combination of jth level of factor A with kth level of factor:Jl .
i.e.
Aj x Bk interaction effect.

64

FACTORIAL EXPERIMENT

L,i (u.~ )Ik =L,k( C(~ )Ik =


is the random error which again follows normal distribution.

Eijk

ijk -N ((0, 0i 2)

Step - wise procedure


Step 1 Ho: There is no significant difference between the different levels of Factor A and
Factor B and the interaction effect.
Step 2
Factor B

Factor A

1
2
3

T11
T21

T12
T22

3 .................... b
T13 .....................T1b
T23- .................... T2b

Ta1
B1

Ta2
B2

Ta3 .................... Tab


B3 .................................. Bb

Total

Tjk

Aj

-. =

Y.jk

L}Tjk
k=1

= Yj.

I,Aj

Total
A1
A2

Aa
G

L:r Yijk
i=1
Bk=
I,Bk
k

LaTjk = Y.
j=1

=Y..

BIOC~ Total
1,

2 ........... r

R1,

R2 .......... Rr = G
G =

C.F

A1 + A2 + .......... Aa =

B1 + B2+ ............ Bb

= Rl + R2+.. ... Rr

= G2

r ab
Step3
Calculation of Sum of squares

'.

1. Total Sum of Squares(T.S.S) =Sum of Square of all the 'rab' response values - C.F.

=
2. Block Sum of Squares(B.S.S)

Li Li Lk Yijk2

C.F

=
ab

3. Sum of squares due to factor A (S.S due to A)


= A12 + A22 + ....... Aa 2

rb

65

C.F

FACTORIAL EXPERIMENT
4. Sum of squares dues to factor B (S.S due to B)
= B,2 + B22 + ......... +BJ

C.f

ra
5.Sum of squares due to interaction AB (S.S due to AB)

= 1/r LiLkTik2 -

C.F - S.S due to A ~ SS due to B

6. Error sum of squares (ESS)


= T.S.S - (BSS + SS due to A + SS'~ to B+

as due to AB)

Formation of ANOV A

Sources of
variation
Blocks

d.f

S.S

M.S

r-1

B.SS

B.S.S/r-1

a-1

SS due to A

SSA/a-1

b-1

SS due to B

SSB/b-1

AB

(a-1)(b-1)

SS due to AB

SSAB/(a-1 )(b-1)

Error

(r-1) (ab-1) ESS

Total

rab -1

Treatments

ESS/(r-1) (ab-1)

Step 3
Interpretation

Compare F Value for blocks, Factor A, Factor B & interactions AB wIth the
corresponding table value and declare t:1em as significant or not. If it is significant find the
critical difference.
C.d between any two level of factor A at 5% (1 %) =
C.d between any two level of factor Bat 5% (1%) =

~ x t at 5% (1%) for error


~7

d.f

x t at 5% (1%) lor error d.l

a
C.d between any two blocks at 5% (1%)

x t at 5% or 1% error dJ'
.

Three Factor Experiment

When third factor is included in a factorial experiment the principle of design selection
and randomization remain unchanged. The number of treatment combination however
increases fairly rapidly and analysis of the resulting data becomes somewhat more

FACTORIAL EXPERIMENT

complicated. We now have to estimate and test, three main effects, three two factor (first
order) interactions and one three factor (second order) interactions.
Data analysis

Let us suppose that we have factor A at 'a' levels, factor B at 'b' levels and factor C at
'e' levels.

Let us do the experiment in RBD with 'r' blocks each containing a x b x c

experimental units. Let us denote by Yijkl the response value of the jlh level of factor A and kth
level of factor Band Ith level of factor C in the ith block.
Yijkl

:::

)l

+ PI + CXj + ~k +YI + (cx~ )jk + (~y) kl + (ycx)lj + (cx~y ) jkl

Table 1

Blocks

1,2, ..................... r Total

Total

Rl,R2 .................. Rr G

IiRi

" II Ij Ik II Yijkl

Ij IK II Yijkl

Table 2
Factor A x Factor B
Factor B

.............. b

Total

Factor A

1
2

T. 11 .

T.12.

........... T.lb.

Al

T. 21 .

T.22.

............ T.2b.

A2

Tal
B1

T. a2 .

........... T.ab.

Aa

B2

.............. Bb

Total

IiII Yijkl
Ik T.jk. ;
IjTjk. "

87

+E

ijkl

FACTORIAL EXPERIMENT

Table 3
Factor B x Factor C

Factor C
Factor B

..............c

Total

T.. 11

T.. 12

T..21

T..22

...........T.. 1C B1
............T.. 2c B2

B
Total

T.b1
Cl

T..b2
C2

........ T.. bc
........... Cc

Bb

Table 4
Factor C x Factor A

.............. a

Total

...........T.1.c

Cl

T.2.1

T.1.2
T.2.2

............ T.2.e

C2

Ta.l
Al

T.a.2
A2

........ T. a.c
.............. Aa

Cc
G

Factor A
Factor C
1

T.l1

C
Total
T.j.1 =

Lj Tj.1

Li Lk Yijkl CI =

If we do Table 2, Table 3, Table 4, two way analysis of variance we will get the sum of
square due to factor A, factor Bf factor C and the Error Sum of Squares which will correspond
to the one factor interaction (i.e) sum of square due to A x B interaction, Sum of Square due to
B x C interaction, Sum of Square due to C x A interaction respectively.
Table 5 : Factor A x Factor B x Factor C

Factor A
1

Y.m

Y.1ll
Y 121

Y.122

Factor C
...
...
...
...
...
...

Y.lbl

Y.lb2

...

Factor B
1

68

...

Total

Y.lle Y.ll.
Y.12c Y.12.

Y lbc

Y.lb.

FACTORIAL EXPERIMENT

Factor C

I~F

...

...
y 222 ....

...
...

Y.21c Y.21.
Y22c Y.22.

Y 2b2

...

...

Y2bc

Y a12
Y. a22

...

.. .

Y.a11
Ya21

...

...

Y.a1c Y. a1.
Ya2c Y. a2.

Yab1

Y. ab2

...

...

Y.abc

Factor A

Factor B

Y.211

Y221

Y2b1

I Y 212

Total

Y.2b.

:
! :

a.

C.F

Y.ab.

=
rabc
L,iL,i~:kL,1

Total Sum of Squares (T.S.S)

Block sum of squares(B.S.S)

=R1 Z + R22

Y2 ijkl - CF

......... + R f

.;

- CF .

"

abc
From table 5, we calculate Sum of Square due to A xB x C interaction as
1/r L,jLkLI Y. 1k1 2 - C.F - (SSA + SSB + SSC + SSAB + SSBC + SSCA)

Total sum of squares (TSS) =

Sum of Square of response values of all rabc


experimental units - C.F

Error sum of squares =TSS - (BSS + SSA + SSB + SSC + SSAB + SSBC + SSCA + SSABC)
Table 6
Sources
variation

of

Degrees
Freedom

of

Due to Factor A

a-1

Due to Factor B

b-1

Due to Factor C

c-1

AxB

(a-1) (b-1)

BxC

(b-1) (c-1)

AxC

(a-1) (c-1)

AxBxC

(a-1) (b-1) (c-1)

Error

(r-1) (abc-1)

Total

rabc -1

Sum of
squares

Mean
squares

--

69

FACTORIAL EXPERIMENT

~ ~

Critical difference between any two blocks

2EMS x tab. t for error d.f

at 5% (1%) level

abc

Critical difference between any

~o levels of A = r 2EMS
~

at 5% (1%) level

tab. t for error d.f

rbc

at 5% (1%) level

Critical difference .between any two. levels of B ~ ~ 2EMS tab. t for error d.f
,

"

,.

rac

at 5% (1%) \e\JeI

tab. t for error d.f


at 5% (1%) level

Critical difference between any ~

~ x tab.

Interaction AB

Critical difference between any two

Interaction BC

~ 2EMS

Interaction CA

Critical difference between any two

\.

x.

ra

Critical difference between any two

rc

~x
~ rb

t for error d.f

at 5% (1%) level

tab. t for error d.f

at 5% (1%) level

tab. t for error d.f


at 5% (1%) level

tab. t for error d.f

Interaction ABC

at 5% (1%) level

Three important transformations


1.

Arc Sine transformation or Angular transformation

2.

Logarithmic transformation

3.

Square root transformation

70

FACTORIAL EXPERIMENT

Arc Sine transformation

This is done when the values are in proportion or in percentages. If the percentage
values are from 30 - 70, actually there is no need for angular transformation. If there is 0 or
100, corresponding to 0 we must take value 1/4n x 100 where 'n' is the total value in the
proportion or percentage. Corresponding to 100 it is (1-1 14n ) x 100.
Logarithmic transformation

If the values are exponential (e.g.) for microbial counts we have to take logarithm of all
the values and then do ANOVA. If 0 occurs add 1 to all the values and then take logarithm.
Square root transformation

If the values are according to Poisson distribution, we will take square root of all the
values and then do analysis of variance. If 0 occurs, we will add 1 to all the values or Y2 to all
the values and then take square root (i.e)

or

F.

or

{X + F

will be taken. Mostly ~ will be used.

"J

71

ANALYSIS OF COVARIANCE

ANALYSIS OF COVARIANCE (ANACOVA)


Analysis of covariance is a technique that combines the. features of analysis of
variance and regression. Even when the experimental units have been grouped into blocks to
try to make the within block variation small, there may still be substantial variation between the
block. Sometimes, we can recognise that the initial measurement (X) could be made on each
experimental units which in the absence of treatment effects might be expected to be strongly
correlated with the yield variable 'Y'. The accuracy of the experiment can then be improved by
adjusting the value of the Y variable by this initial variable X, often referred to as concomitant
variable or auxiliary variable. For example, pigs in the same litter would be expected to react
similarly to a particular treatment, but due to the considerable difference in the initial weights
there may be change in the final weights or in whatever responsible variable is studied. If we
want to eliminate the variation in Y due to X the adjustments we are making will reduce the
effects of the initial weights and thus we get a more valid estimate of the treatment effects. This
procedure IS known as analysis of covariance
With the analysis of variance (ANOVA) dealing with two variables X and Y, we may
have the following situations
1.

X not significant, and Y significant

2.

X not significant, and Y not significant

3.

X significant, and Y significant

4.

X Significant, and y not significant

In the first two cases, there is no need for ANACOVA. In the la~t two cases, we have to
do ANACOVA.
Uses of ANACOVA
1.

To increase precision in a randomized experiment /

2.

To adjust for sources of bias, in observational studies.

3.

To throw light on the nature of true effects in randomized experiments.

4.

To study regression in multiple classification

One way ANACOVA


Linear model here is,
=

Where
\

Yii is the

response value of jill unit, ~ ilh treatment effect

72

ANAl YSIS OF COVARIANCE

11 is the general mean effect


ti is the ith treatment effect
(Xii - x) is the deviation of the concomitant;variable in the in jlh unit of the i" treatment

B- regression coefficient
Eii - random error component which is independently and identically distributed normally with
mean 0 and variance a}
The sum of squares of X, Y and sum of products XY are calculated as follows.
Let us denote Gx

Sum of all X's

Gy

Sum of all Y's

Gxy

sum of all products X and Y in all experimental units.

C.Fx

lit

where N total experimental units

N
C.Fy
C.Fxy

Sd

=
=

I
I

N
Gx.G y

liJ

Total sum of squares for X(T xx)

Sum of square of all X - CF x

Total sum of squares for Y (Tyy)

Sum of square of all Y . CFy

Total sum of Products XY (Txy)

Sum of all products XY - CFxy

T1X, T2x, ......... Tix

Total of the values of X in treatment 1, treatment 2.... ..


treatment t
.
T1y, T2y, ......... Tty
= Total of the values of Y in treatment 1, treatment 2......
treatment t
T1Xy, T2xy, ......... Ttxy
Total of the products XY in treatment 1, treatment 2,
treatment t
Treatment sum of squares for X (Trxx)
- C.Fx
=

n
Treatment sum of squares for Y (T ryy)
=
T1; + T2/ + ........ T~
n
Treatment sum of products XY (Trxy)

. C.Fy

T1 xT1y + T2xT2y + ........ TrxTry,


n
Error sum of squares for X (Exx) = Txx - Trxx
=

Error sum of square for y (Eyy)

Tyy - Tryy

Error sum of products XY (Exy)

Txy - Trxy

)C.Fxy

Table 1 sum of Squares / Sum of Products

73

ANALYSIS OF COVARIANCE

Source of variation
Treatments
Error
Total

Of
t-1
N-t
N-1

Regression coefficient (b)

Correction due to regression

XY

Trxx
Exx
Txx

Y
Tryy
Eyy
Tyy

Trxy
Exy .
Txy

~2

Exx
Table 2 Regression Analysis
Of
1

Source of variation
Regression

S.S
Exx

Residual

N+1

Eyy - ~2
Exx

Error

N-t

Eyy

M.S

~2

~2

Exx

>

-rEyy- ~2
Exx
N- t - 1

If F is not significant, then the regression co-efficient is not significant and there is no
need for ANACOVA. If F is significant or highly significant then the regression co-efficient is
significant or highly significant and there is possibility of influence of X on Y, ANACOVA is a
must.
Table 3 Adjusted Analysis
Sources
1

N-1

Error +
treatments
Error

SSx
3

SSxy
4

SSy
5

En + Trxx

Exy + Trxy

Eyy + Tryy

OF
2

11).2 = (Exy + Trxy)2


3

N-t

En + Trxx

~(1d~

Eyy

Exy

En

ResSS
7

RegSS
6

Df 5-6 Eyy + Tryy - (E!J: + T~)2


(1 df)
(L)
En + Trxx
Eyy-~

Exx

En (M)
n+1

Table 4 ANOVA with adjusted treatment SS and error SS


Source of variation
Treatment

Of
t-1

S.S
L-M

Error

N-t-1

M.S
L-M = adj TrMS
t-1
_M._ = adj. EMS
N-t-1

>

If F is not significant then the Y's do not differ significantly. If F is significant or hi9ttt;
significant, find the critical difference between any two adjusted treatment means of Y.
Adj

Yi = Yi + b (xi - ~)

Critical difference between any two adjusted mean of Y at 5% (1%) is given by

74

ANALYSIS OF COVARIANCE

2EMS adj [1 + TrMsadiJ


n

x terror df at 5% (1%)

Exx

Two way analysis of covariance

The model is given by

+ ti + bj + B(Xij - x) + tij

Yij

j.l

j.l

General mean effect

1i

ith treatment effect

bi

'

. tii

" =
,

,t'7,.

jth block effect


regression effect
random component which is independently and identically distrib~\normally .
with means zero and variance cr1 2

Block sum of squares( Bxx) =

B,l + B2l + ....... + Bbl - CFx


t

CFx = Gx2where N = bt

N
Bix = sum of x in ith block

BlOck sum of squares (Byy) = Btl + B21 + ........ + Bbl


\

- CFy

CFy=j
N

sum of Y in ith block


" Block sum of products (Bxy)

;-~-

CFxy = Gx Gy

N
Calculate Trxx, Tryy, Trxy, Txx, Tyy, Txy as in the previous case
Exx

Txx - (T rxx + Bxx)

Exy, Eyy will be calculated similarly

... " ' -

75

ANALYSIS OF COVARIANCE

Table 1 sum of squares I Products


Source or variation
Blocks
Treatments
Error
Total

Of
(b-1 )
(t-1 )
(b-1) (t -1 )
bt-1

Byy
Tryy
Eyy
TJ"i

Bxv
Bxx
Trxx
TrXY
Exy
Exx
Txy
Txx
Table 2 Regression Analysis

OF
1

Ed

(b-1) (t-1)-1

Exx
Eyy - if
Exx

Error

N-t

Eyy

MS

SS

Source of
variation
to
Due
reqression
Residual

SSy

SPxy

SSx

fd /

1
Exx
Eyy - if
Exx
(b-1) (t-1)-1

>

If this F is not significant, then there is no need for ANACOVA. (i.e) There is no
significant influence of X on Y. If this F is significant or highly significant, we have to do
ANACOVA.
Table 3 Adjusted sum of Squares
SV
Treatment + Error

OF
b(t -1 )

(b-1 )(t-1)

Error

SSX
Trxx + Exx

SS

SSXY
Trxy + Exy

SSY
Tryy + Eyy

Tryy + Eyy -ffixv + Ex~ g


Trxx + Exxl
....
'-

Eyy - if

_".

Exx
Table 4
Source
Treatment

Error

MS

SS

df
t-1

L-M

L-M
t -1

(b-1) (t -1) - 1

M
(b-1) (t -1) -1

If F is not significant, there is no significant difference among the treatments on the


effect of Y even though there is a significant influence of X on Y. If F is significant we have to
calculate the adjusted mean of Y and the critical difference as in one way ANACOVA and then
obtain the significant difference of the treatments.
=

Critical difference between any two adjusted mean of Y at 5% (1 %) is given by


=

2EMS adj
n

[1 + TrM~

t errordf at 5% (1%)

ExxJ

76

STATISTICAL QUALITY CONTROL

STATISTICAL QUALITY CONTROL


Meaning

Quality Control is the maintenance of quality or certain prescribed standards in a


uniform flow of manufactured products. The principles of Statistical Quality Control can be
applied wherever we are concerned with the establishment of uniform standards of quality in a
continuous flow of products or where the units or parts of the flow are spaced consecutively in
time. These standards may be in terms of size, weight, strength, colour, taste etc. Some of
these standards are easy to define in numerical terms.

For others, the qualitative

characteristics have to be defined and the numbers falling in each will have to be counted.
The quality standards are normally set by the makers of the product. The quality
consciousness amongst producers is always more, when there is competition from rival
producers. Also, when the consumers are quality conscious. The continuing patronage of
customers depends a great deal on maintenance of quality standards.
Statistical Quality Control is a planned collection and effective use of data for studying
causes of variations in quality either as between process, procedures, materials, machines etc.,
or over periods of time. This cause effect analysis is then fed back into the system with a view
to continuous action on the process of handling, manufacturing, packaging, transporting and
delivery at end use.
Different types of Quality Measures

Many quality characteristics are measurable quantitatively and can be expressed as


variables e.g. Fat content in milk, the chemical composition of a drug etc. All these are
continuous variables, which is generally the case. Sometimes the variable may also be
discrete. E.g. the number of diseased animals.
Often, the quality cannot be measured and is expressed as an attribute. Here the
items may be classified as good (or non-defective) and defective ones. Thus a bolt which does
not fit the nut is defective. Also an item which contains one or more defect is defective. Again
although quality is measurable, one may decide to treat it as an attribute for the sake of
economy. Thus a manufacture producing rods of a certain length may classify a rod as
defective, if it is too long or too short.
Uses of Statistical Quality Control

The methods of Statistical Quality Control are used widely in production, storage,
packing and transportation. The tests being confined to only a part of the whole lot and at
times only at suitable interval.

These methods have saved lot of time and expenditure,

otherwise involved in full inspection. Especially when the tests involve the destruction of the
product as in the case of the quality of eggs, testing the blood group etc.

77

STA TlSTICAL QUALITY CONTROL

Advantages of Statistical Quality Control


i:s..

An objective check is maintained on the quality of the product.

i:s..

It has a healthy influency on the workers for they know that quality is being checked.

i:s..

If producers have strict quality control, the users may rely on it and may not resort
thorough check.

i:s..

The quality can be defended or maintained before any Governmental inquiry on the basis .

to a

of statistical quality records.


&

The degree of check can be related with the precision required in each process and the
part performance thus economizing cost of inspection.

&

Good deal of data are available. The data on average level of performance and the
average range of variability can be used by the management for choice of plant and
machinery as well as technical staff. In other words, this data can help evaluation of the
men and equipment besides the process and product.

a Incidentally, the efficient working life of machinery can be determined and the discarding of
it is at the right time when it falls to produce goods of a desired specification despite
necessary maintenance and adjustments. ".
Basis of Statistical Quality Control

The basis of Statistical Quality Control is the degree of variability in the size or the
magnitude of a given characteristic of the product. Some amount of variability is bound to be
there, however, scientific and accurate the production process is. - The various causes of
variation may be classified into :
a)

Specific and Identifiable or Preventable

b)

Random and chance or Allowable

a)

Specific and Identifiable

These causes include those arising on account ot inexperienced worker:

td in' :-..
f

tool or design or a defect in the materials used.


b)

Random and Chance

These causes have nothing to do with any latent or patent defect in the production
process. These arise in the process of taking out samples and drawing influences. It is difficult
to assign any specific cause for these variations.
Purpose of Statistical Quality Control

The main purpose of Statistical Quality Control is to separate these assignable causes
from the chance or random causes. Here we are more interested in the variations within the
sample and not between samples. The latter are those irrespective of the fact the lots are

78

STATISTICAL QUALITY CONTROL

drawn during a continuous

pro(~ess.

The statistical process of achieving this object is to lay

down the limits of chance variatiOns (including those between sample variations).Any variations
beyond those limits must be dLJe to assignable causes within samples. If it is found that the
process is out of control the specific causes may be looked into through technical examination
of the production process in its various stages. Even if each process of the whole production
process is under cuntrol, statistical methods are necessary to detect the uniformity of their
standards.
Even if the segregatior1 of assignable and chance causes is not fully achieved, the
method of quality control will erlsure that the variations are not serious so as to damage the
goodw'm

o~

the product. 1hus tne wnole oasls aT statistical quaiity coritrOI IS tne begree aT

variability whether it is within the tolerable limits due to chance only and the product is
acceptable or not.
Types of Control

There are two broad I/Vays of statistically controlling the quality of the product viz.
process control and the product control.
i

./

Process Control

This is concerned with controlling the quality of goods manufactured in the process of
production. Process Control detects whether the production process is going on in the desired
fashion. In other words, it controls quality of the goods to be produced. It ensures that the
machineries are turning out the product of a requisite standard. The Statistical tool applied in
process control is the Control Chart. The primary objectives of process control are (a) to keep
the manufacturing process in control so that the proportion of defective units is not excessive
and (b) to determine whether a state of control exists.
Product Control

This is concerned witn the inspection of materials to determine their acceptability,


whether they be in raw, serYli-finished or complete state.

This is known as acceptance

inspection or sampling inspection. The object of acceptance inspection is to evaluate a


definite lot of materials that is already in existence and about whose quality a decision must be
made. This is done by inspecti(l9 a sample of the material, using definite statistical standards to
infer from the quality of the s&lmple whether the whole lot is acceptable. The standards in
acceptance inspection are set according to what is required of the product rather than by the
inherent capabilities of the process, as in process control.
Even if the process is under control, the individual products may turn out to be non~c:.c:.~9-t..~b.l~, ~lc:;.Q. ~t.. ~c:;. \\Q.t.. \I..~c:.0s.s.a~

that the. tfJ"QCiuc.L cnntroJ. in. tha casa of inQ.uts_ shall. PJ)s!lf.P..
that the process is under control. Actually, process control is concerned mostly with

79

STATISTICAL QUALITY CONTROL

operations, machines and hands while product control is concerned with the quality of the
product turned out. Certainly a good process control will not require a strict product control.
Techniques

Product Control
(By Sampling Inspection)
I

Process Control
(By Control Charts)
I
Variables
I

XChart

R-Chart

Attributes
I
C-Chart np chart

Variables

Attributes

p-chart

Control Chart

A Control Chart is a statistical device principally used for thErstudy 1hdtcontrol' Of


respective processes.
Dr. Walter A.Shewhart, its originator suggested that the control chart may serve first to
define the goal or standard for the process that the management might serve to attain;
secondly, it may be used as an instrument to attain that goal; and thirdly it may serve as a
means of judging whether the goal is being achieved. Thus it is an instrument to be used in
specification, production and inspection and is the core of Statistical Quality Control.
A Control Chart is essentially a graphic device for presenting data so as to directly
reveal the frequency and extent of variations from established standards or goals. Control
charts are simply to construct and easy to interpret and they tell the manager at a glance
whether or not the process is in control, (i.e.) within the tolerance limits. A Control chart
consists of three horizontal lines :
&

A Central line to indicate the desired standard or level of the process.

&

Upper Control Limit and

&

Lower Control Limit


From time to time, a sample is taken and the data are plotted on the graph. So long as

the sample point falls within the upper and lower control limits there is nothing to worry as in
such a case, the variation between the samples is attributed to chance or unknown causes.
It is only when a sample point falls outside the control limits that it is considered to be a
danger signal indicating that assignable causes at bringing about variations. Thus there is no
wastage of time and money in an effort to find the reason for random variation but as soon as
assignable cause is apparent, necessary corrective action is taken. Generally, of all dots are
found between the upper and lower control limits it is assumed that the process is "in control'
and only chance causes are present. However, sometimes dots are found arranged in some

80

STATISTICAL QUALITY CONTROL

peculiar way.

Although they appear between the control limits a substantial number of

successive dots may be located on the same side of the Central line or around control limit or
successive dots may follow a definite path leading towards the upper and lower control limit.
Such patterns of dots within control limit should also be considered as danger signals, which
may indicate a change in the production process. Thus control charts are not only watched for
pOints falling outside the control limits, they are also scrutinized for unusual patterns suggesting
trouble.

How to set up the Control Limits


The basis of Control Chart is the setting up of upper and lower control limits. These
limits are used as a basis for judging the significance of the Quality variation from sample, lot to
lot or from time to time. The moment a point falls outside these limits it is taken to be a danger
signal. The Control limits serve as a guide for action and therefore, they are also referred to as
action limits. Control limits are established by computation based upon.

a)

Data covering past and current production records.

b)

Statistical formulae whose reliability has been proved in practice.

Parameter
Those characteristics pertaining to population are called parameters.

Statistics
Those pertaining to the Sample are called Statistics.
Let "8" be the breaking strength of the material. Let T be the corresponding statistic.
If the process is under control then the value of 8 must be the same for all sub-groups. In fact,
even if the process is under control, there will be small variations in the value of T from one
sub-group to another. These variations are allowable, as they occur due to chance. Let us
now fix a norm for the allowable variations so that if the process is under control, the value of T
will lie between the norm values. If T falls outside these norm values, we say that these are
specific systematic variation. Let J.lT be the mean value of the different groups and
variance. Then the limits are given by J.lT -30 T and

J.lT

0 T 2,

the

+ 30 T. Here the assumption is that

the statistic follows the normal distribution with mean J.lT and S.D 0

T.

We know that in the case

of normal distribution 99.73% of the observation will lie within the 30 limits. In assigning these
limits the chance to go using is only 0.27% (i.e.) nearly 3 out of 100. The value

J.lT -

30T is

taken as the Lower Control Limit (LCL) and the value J.lT + 30T is taken as the Upper Control
Limit (UCL).

.81

~ TAT I~ TI(;AL

L1UALI TY (;UN THUL

Major parts of a Control Chart

A Control Chart generally includes the following 4 major parts


13.

Quality Scale: This is a vertical scale. The scale is marked according to the quality
characteristic (either in variables or attributes) of each sample.

13.

Plotted Samples: The qualities of individual items of a sample are not shown as a control
chart. Only the quality of the entire sample represented by a single value (a statistic) is
plotted. The single value plotted on a chart is in the form of a dot or cross or a small circle.

(:s..

Sample Numbers: The samples plotted on a Control Chart are numbered individually and
consecutively on a horizontal line. The line is usually placed at the bottom of the chart.
The samples are also referred to as sub-groups in Statistical Quality Control.Generally 25
sub-groups are used in constructing

R Control

Chart.

The success of control chart technique depends largely upon the efficient grouping of
items into samples, such that variation in quality among items within the same sample is small,
but variation between one sample and another is as large as possible.

Such a sample is

known as "rational sub-group'. The obvious way for the selection of sub-groups is the order of
production. Here, each sub-group will be of products during a short period so that there will not
be any remarkable change within that period. The use of each sub-group is that it will reveal
the causes for variation. There may also be causes which will not be revealed by these subgroups.
13.

Horizontal Lines: The central line represents the average quality of the samples plotted
on the chart. The line above the Central line is the Upper Control Limit, which is obtained
by adding 30' to the mean i.e.mean+3 (S.D). The line below the central line is the Lower
Control Limit which is Mean -3 (S.D).

.Types of Control Charts

Control Charts can be divided into two categories


a)

,
\

Control charts of variables and (b) Control chart of attributes.

Variables are those quality characteristics of a product, which are measurable and can be
expressed in specific units of measurement such as diameter of radio knobs, which can be
measured and expressed in centimeters, tensile strength of cement, which can be expressed in
specific measures per square inch of space etc. Attributes on the other hand are those product
characteristics which are not available to measurement. Such characterization can only be
identified by their presence or absence from the product. For example, we may say that plastic
is cracked or not cracked whether bottles that have been manufactured contained holes or not.
Attributes may be judged either by the proportion of units that are defective or by the number of
defects per unit. Thus the data resulting from inspection of a quality characteristic may take
anyone of the following forms

82

STATISTICAL QUALITY CONTROL

A record of ttje actual measurements of the quality c~acteristics for individual articles or
specimens.

&

A record of number of articles or specimens inspected and if the number found defective.

&

A record of the number of defects that are found in a sample. The number of defects per
sample may be very large compared to the average number of defects per sample.
For purpose of control, data of the first form (i) listed above may be summarized by

taking two statistical measures, the average (X) and the standard deviation (cr) or the average

(X) and the range R. Data of the second form (ii) can be summarized in terms of fraction
defective (p) and data of type (iii) can be summarized in terms of number of defects per unit (c)
Mostly variables which are reasonable are of continuous type, or of type whose
frequency distribution follow normal laws. For purpose of control data, two types of control
charts are used. One for the mean of measurement (X - Chart) and another for the range of
the measurement (R Chart).
The data of type (iii) are discrete such as number of defective glasses in a lot, number
of surface defects on a floor etc. In such cases, the number of defectives on an item may be
nil, one, two or more. In such a case, the distribution explaining the number of items according
to number of defects on it, when the process is in control will be a Poisson distribution. Under
such circumstances, control charts for average number of defects per item, C-chart is used.
For the data of type(ii) Binomial distribution explains the chance variations in the
proportion of defective provided the sample selected from the lot is relatively very small. In
such cases, control chart for proportion of defective (p-chart) is used.

X - Chart
The chart is constructed on the basis of a series of samples drawn frequently during a
production process which are called rational sub-groups. Usually smaller sub-groups of size 4
or 5 units are preferred and at least 25 sub-groups are used in the evaluation of control limits.
Construction of
&

X - Chart

When the standard deviation of the population is given as cr


Upper Control Limit of X =X + 3cr x

<

The UCL = X +3cr

& LCL = X -3cr

I
!

<~ When Standard Deviation is not given

UCL=

X+

3.Q

~ &(j = d'R is as biased estimator of (j and d' is the correction

factor and

Xis the mean of sample means.


83

STA T1STICAL QUALITY CONTROL

X+

and similarly LCL

d'

and the values of A2 are given in the table for n=2 to 15.
iii)

UCL X =

X + A1 s where s is the mean of sample Standard Deviations.

LCL X =

X-A1

Then draw control chart. If no point falls outside the control limits, consider them as a
homogeneous group. If a pOint falls beyond the control limits, it is regarded that an assignable
cause has thrown the process out of control. The next step is to remove all sample results
which are outside the control limits and revise the control limits for the remaining samples.
Compare all the remaining plotted points against the revised control limits, until all of the
sample means are within the new control limits, the procedure of computation may be
repeated.
R-Chart

It is

uSGct to

show the variability or

diS~ of ~ q~8[JtY ~produced b(a ~iVen ,

process.
Though Standard Deviation is the best measure of variation, Range is commonly used
in Statistical Quality Control to study the pattern of variations in quality. This is due to the fact
that for small samples of size, say 15 or less, range provides a good estimate of a.
R Chart (or a chart) is the companion chart to X - Chart and both are usually required
for adequate analysis of the production process under study. It is generally presented along
with X - Chart .
';c
The required values for constructing R Chart are
a)

The range of each sample, R

b)

The mean of the sample ranges, R

c)

Upper Control Limit and Lower Control Limit

U.C.LR = R + 3a Rand L.C.LR = R - 30 Rwhere

R=S.E. of the range In practice,

we take U.C.LR =D4 Rand L.C.LR = D3R~,

Note:
R Chart is used only when the sample size is small. If the size is >12, it is better to
prefer a-chart.

84

STA T/ST/CAL QUALITY CONTROL

C-Chart

The C Chart is designed to control the number of defects per unit. Control Chart for C
is used in situations wherein the opportunity for defects is large while the actual occurrence
tends to be small, such situations are described by the Poisson distribution. This may occur in
the case of the number of imperfections in a piece of cloth, the number of air bubbles in a
piece of glass, the number of blemishes in a sheet of paper, etc. Let C stand for the number of
defects counted in one unit of cloth (paper, glass, roll of wire) and C for the mean of the
defects counted in several (usually 25 or more) such units of cloth. The center line of the
Control Chart for C is C and the 3 - sigma Control limits.

Note:

The sample size must be uniform while using C-Chart. If there is variation in sample
size and if it is large, then we have to go in for p-chart.
p-Chart
Control chart for p (fraction defective)

The p chart is designed to control the percentage or proportion of defectives per


sample, since the number of defectives (c) can be converted into a percentage expressed as
decimal fraction merely by dividing c by the sample size, the p chart may be used in place of
the C-chart. The p chart has at least two advantages over the C-chart.
'(3..

Expressing the defectives as a % or fraction of production is more meaningful and more


generally understood than would be the statement of the number of defectives.

'(3..

Where the size of the sample variables from sample to sample, the p-chart permits a more
straight forward presentation.
The only disadvantage is here we have to calculate c/n.
-

Then the control limits are p 3cr p


This chart has its theoretical basis in the binomial distribution and generally gives best
Jesuits when the sample is large say at least 50.

" P= S.E. of proportion

jP

(1-

Pi

n
It is always preferred to express results in terms of percent defective rather than
lction defective".
np Chart

A np chart shows the actual number of defects found in each sample. If the number of
items inspected on each occasion is the same, the plotting of the actual number of defective

85

STATISTICAL QUALITY CONTROL

may be more convenient and meaningful than the fraction defective. The construction and
interpretation is similar to 'p , chart.
Then the control limits are n p 3a n p
anp

In

p(1-

P)

Acceptance or Sampling Inspection Plans

The Control charts described above cannot be applied to all types of problems. They
are useful only for the regulation of the manufacturing problem. Another important field of
quantity control is acceptance sampling. Inspection for acceptance purposes is carried out at
many stages in manufacturing. For example, there may be inspection of incoming materials
and parts, process inspection at various pOints in the manufacturing operations, final inspection
by a manufacturer of his own product and ultimately inspection of the finished product by one
or more purchasers. Much of the acceptance inspection is carried out on sampling basis. The
use of sampling inspection by a purchaser to decide whether or not to accept the product is
known as acceptance samplings or sampling inspection. A sample of the product is inspected
and if the number of defective items is more than a stated number known as the acceptance
number, the product is rejected. The standards in acceptance inspection are set according to
what is required of the product, rather than by the inherent capabilities of the process, as in the
process control. The purpose of acceptance sampling is therefore whether to accept or reject a
product. It does not attempt to control quality during manufacturing process, so as to the
techniques described earlier. Sampling inspection is referred to a product control, because it is
designed to provide decision procedures under which a lot will be accepted or rejected.

..1

Application of Acceptance Sampling


'&

To determine whether a batch of items called an inspection, lot or simply a lot, that has
been delivered by a supplier is of acceptable quality.

'&

To make sure that a lot that is complete and ready for shipments is of adequate quality ...

'&

To determine whether the partly completed material is of sufficiently high quality to justify
further processing.

Uses of Acceptance Sampling


'&

Acceptance sampling is much less expensive than 100% inspection.

'&

Because of the laborious work involved in 100 % inspection, a good sampling plan may
actually give better quality assurance than 100% inspection.

a In modern manufacturing plants, acceptance sampling is used for evaluating the


acceptability of incoming lots of raw materials and parts at various stages of manufactured
and final inspection of finished product.

STATISTICAL QUALITY CONTROL

a. Where quality can be tested only by destroying items, as in determining the strength of
glass containers, 100% inspection is out of question and sampling must be used. But,
there are situations where 100% inspection is very essential. For example, in testing rifles
used by soldiers, we therefore must test each and every rifle.

.i

..

types of inspection

Inspection may be visual (e.g. for surface defects or cracks) or by means of


measurements (e.g. % of carbon in steel, lengthy life of an electric bulb), when the inspection is
one or moremeasurements, we have inspection of variables and when the result of inspection
is merely a classification of the product into good or bad (acceptable or not acceptable), we
have inspection of attributes.
The application of acceptance plan for attributes is much simpler and less costly than
that for variables because it is easier to record or compute the conformity or non-conformity to
specifications.

Attribute sampling plans are thus more popularly used than the variables.

Terms involved in Sampling plans

Since under a sampling inspection plan a decision made is to whether to accept a lot
or reject a lot on the basis of a sample, there is a possibility of (1) rejecting a lot is
unsatisfactory when it is of acceptable quality and (2) accepting a lot is satisfactory when it is
in fact below the quality level. Hence in any acceptance plan, the producers and consumers,
the sellers and the buyers are exposed to some risks, which are called producer's and
consumer's risks. The producer's risk is the risk a producer takes in rejecting a lot by a
sampling plan even though it confirms to requirements. This is equivalent to the concept of
type I error or the probability of rejecting a hypothesis when it is in fact true. The consumer's
risk is the risk that a lot of certain quality will be accepted by a sampling plan. When in fact it is
below the expected or required standard. It is equivalent to type II error which is the probability
of accepting a hypothesis when the hypothesis is false.
AQL & LTPD

In order to measure the customer's risks, we must define maximum percentage of


defective items in lots which the consumer wishes to accept. This is called Lot Tolerance
Percentage Defective (LTPD). Similarly, to measure the producer's milk we define a minimum
percentage of defective items in a lot below which lot should be accepted. This is known as
Acceptable Quality Level or AQL. The producer's risk is now defined as the probability that a
lot having AQL will be rejected and the consumer's risk as the probability that a lot having
LTPD will be accepted. The actual levels of the AQL and LTPD must be decided by
negotiations between the consumer and the producer.

87

STATISTICAL QUALITY CONTROL

Operating Characteristic Curve (DC Curve)


In judging various acceptance sampling plans it is desirable to compare their
performance over a range or possible quality level of submitted product. An excellent picture of
the performance is given by the operating characteristic curve. Such curves are commonly
referred to as OC curves. The OC curve of an acceptance sampling plan shows the ability of
the plan to distinguish between good and bad loss. For any given fraction defective p in a
submitted lot, the OC curve shows the probability pa that such a lot will be accepted by the
sampling plan.

Shape of an Ideal DC curve


The ideal OC curve would be one for which all good lots are accepted and all bad lots
are rejected. In practice, no sampling plan can have an OC curve of the ideal type. The
degree to which an actual OC curve approximately the ideal curve depends upon nand c.

Construction of an DC curve.
An OC curve can be determined by using either the Poisson distribution or the
Thomdike chart. The Poisson distribution can be used in all situations where p is less than .10
(or if the pn is less than 5) and the lot size is at least 10 times the size of the sample.
In a situation in which these conditions use not yet, the theoretically correct approach
is to use the binomial or the hyper geometric distribution.

However, for most industrial

situations, the Poisson distribution can be used without serious loss of accuracy.

Methods of Acceptance Sampling Plans

There are three methods of acceptance sampling plans


1)

Single Sampling Plan

2)

Double Sampling Plan

3)

Multiple or Sampling Plan.

1)

Single Sampling Plan


When the decision whether to accept a lot or reject, a Lot is always made on the basis

of only one sample, the acceptance plan is described as a single sampling plan. This is the
simplest type of sampling plan.

In any systematic plan for single sampling, three things are

specified namely (a) Number of items N in the lot from which the sample is to be drawn

(b)

Number of articles n in the random sample drawn from the lot which is to be inspected. (c) The
acceptance number c. This acceptance number is the maximum allowable number of defective
articles in the sample. More than, this will cause rejection of the lot.

88

STA TISTICAL QUALITY CONTROL

Thus a sampling plan may be specified in this way (a) N=200 (b) n=20 (c) c=1 which
was that a random sampling of 20 items from a lot containing 200 items if the sample contain
more than one defective item, reject the lot. Otherwise accept the lot.

Double Sampling Plan


Here we are making a decision by examining two samples.

Here 5 things are

specified.
1)

n1 = Number of pieces in the first sample.

2)

Cl

= Acceptance number for the first sample ( the maximum number of defectivegtfrat

will permit the acceptance of the lot on the basis of the first sample).
3)

n2 = Number of pieces in the second sample.

4)

nl + n2 = Number of pieces in the two samples combined.

5)

C2

=Acceptance number for the samples combined.

(maximum number of defectives that will permit the acceptance of the lot on the basis of ~
two samples)
Thus a doubling sampling plan may be :
N=500, nl=20,cl=1, n2=60,c2=4
Which will be interpreted as,
1)

Inspect a first sample of 20 from a lot of 500

2)

Accept the lot on the basis of the first sample if it contains 1 defective.

3)

Reject the lot on the basis of first sample if the sample contain mort than one
defectives.

4)

Inspect a second sample of 60 if the first sample contains 2,3,4 defectives.

5)

Accept the lot on the basis of combined sample of 80 if the combined sample contains
4 Or less defectives.

6)

Reject the lot on the basis of combined sample if the combine rl sample contains more
than 4 defectives.

Advantages of double sampling


It gives the lot a second chance.
2

For the producer's side, it will be unfair to reject a lot on the basiS of a single sample.

89

STATISTICAL QUALITY CONTROL

Multiple or sequential sampling plan

Plan permitting three to unlimited number of samples are described as multiple or


sequential sampling plan. However, such plans are quite complicated and rarely used in
practice.
Average Outgoing Quality Limit (AOQL)

The expected fraction defective remaining in the lot after the application of the
sampling plan is called the average outgoing quality (AOO). This is a function of p', the actual
fraction defective in the lot. The maximum value of the average outgoing quality (AOO), the
maximum being taken with respect to p, is known as the Average Outgoing Ouality Lot (AOOL).
Average Sample Number (ASN)

The expected value of the sample size require for using to a decision (i.e) for
acceptance or rejection under the sampling inspection plan of a lot is called average sample
number (ASN). This is naturally a function of p, the actual fraction defective of the lot. The
curve obtained by plotting ASN against 'p' is called ASN curve. Obviously, other factors
remaining the same the lower, the ASN curve, the better is the sampling inspection plan.

90

't'Table

d.f

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

TABLE 1 : CRITICAL VALUES OF "t' DISTRIBUTION


0.05
d.t
0.05
0.01
0.01
12.706 63.657
44
2.015
2.692
4.303
45
2.014
2.690
9.925
3.182
46
2.013
2.687
5.841
2.776
47
2.012
2.685
4.604
2.571
48
2.682
2.011
4.032
2.447
49
2.010
2.680
3.707
2.365
2.009
2.678
50
3.499
2.674
2.306
52
2.007
3.355
2.670
2.262
54
2.005
3.250
2.003
2.667
2.228
56
3.169
2.663
2.002
2.201
58
3.106
2.660
2.000
2.179
60
3.055
1.999
2.657
2.160
62
3.012
2.655
64
1.998
2.145
2.977
2.655
66
1.998
2.131
2.947
1.995
2.650
68
2.120
2.921
2.648
70
1.994
32.110
2.898
72
1.993
2.646
2.101
2.878
2.644
74
1.993
2.093
2.861
1.992
2.642
76
2.086
2.845
2.640
78
1.991
2.080
2.831
2.639
1.990
SO
2.074
2.S19
2.637
82
1.989
2.069
2.807
84
2.636
1.989
2.064
2.797
1.988
2.634
86
2.060
2.787
1.987
2.633
88
2.056
2.779
1.987
2.632
90
2.052
2.771
92
2.630
1.986
2.048
2.763
94
2.629
1.986
2.045
2.756
2.628
96
1.985
2.042
2.750
2.627
98
1.984
2.040
2.744
1.984
2.626
100
2.037
2.738
1.983
2.623
105
2.035
2.733
2.621
110
1.982
2.032
2.728
115
1.981
2.619
2.030
2.724
2.02g
120
1.980
2.617
2.719
125
1.979
2.616
2.026
2.715
2.024
130
1.878
2.614
2.712
2.023
135
1.978
2.613
2.708
2.021
140
1.977
2.611
2.704
1.976
2.020
145
2.610
2.701
2.018
2.609
2.698
150
1.976
2.017
2.607
2.695
160
1.975

91

Chi - Square Table

TABLE 2: CRITICAL VALUES OF CHISQUARE DISTRIBUTION

1
2

0.05
3.84
5.99

:J

782

4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

9.49
11.07
12.59
14.07
15.51
16.92
18.31
19.68
21.03
22.36
23.68
25.00
26.30
27.59
28.87
30.14
31.41
32.67
33.92
35.17
36.42
37.65
38.89
40.11

d.f

28

41.:J4,

29
30

42.56
43.77

92

0.01
6.64
9.21
! !.:J4,
13.28
15.09
16.81
18.48
20.09
21.67
23.21
24.72
26.22
27.69
29.14
30.58
32.00
33.41
34.80
36.19
37.57
38.93
40.29
41.64
42.98
44.31
45.64
46.96
48.2a
49.59
50.89

Standard Latin Squares

TABLE 3: THE 5 X 5 LATIN SQUARES


FIRST TRANSFORMATION SET 25 STANDARD SQUARES AND THEIR CONJUGATES

ABCDE
BAECD
CDAEB
DEBAC
BECDBA

ABCDE
BADEC
CEBAD
DCEBA
EDACB

ABCDE
BADEC
CDEAB
DEBCA
ECABD

ABCDE
BAECD
CEDAB
DCI3EA
EDABC

ABCDE
BCDEA
CEBAD
DAEBC
EDACB

1,2

3,4

5,6

7,8

9,10

ABCDE
BCDEA
CEABD
DAECB
EDBAC

ABCDE
BCEAD
COBEA
DEACB
EADBC

ABCDE
BCEAD
CADEB
DEI3CA
EDABC

ABCDE
BCDEA
CAEBD
DEACB
EDBAC

ABCDE
BCAED
CEDAB
DAEBC
EDBCA

11,12

13,14

15,16

17,18

19,20

ABCDE
BCAED
CDEBA
DEBAC
BADCB

ABCDE
BDECA
CABED
DEABC
ECDAB

ABCDE
BDAEC
CEDBA
DCeAB
EABCD

ABCDE
BDECA
CADEB
DEABC
ECBAD

ABCDE
BDAEC
CEDBA
DAECB
ECBAD

21,22

23,24

25,26

27,28

29,30

ABCDE
BDEAC
CEDBA
DCAEB
EABCD

ABCDE
BDAEC
CEBAD
DAECB
ECDBA

ABCDE
BDeAC
CEABD
DCSEA
EADCB

ABCDE
BEDAC
CABED
DCEBA
EDACB

ABCDE
BEDAC
CAEBD
DCAEB
EDBCA

."

31,32

33,34

35,36

37,38

39,40

ABCDE
BEACD
CDEAB
DABEC
ECDAB

ABCDE
BEDCA
CAEBD
DCAEB
EDBAC

ABCDE
BEDCA
CDEAB
DASEC
ECABD

ABCDE
BEACD
CDBEA
DCEAB
EADBC

ABCDE
BEDAC
CDAEB
DCEBA
EABCD

41,42

43,44

45,46

47,48

49,50

Second transformation Set: 6 Self - conjugate standard squaree

ABCDE
BCEAD
CEDBA
DABEC
EDACB

ABCDE
BCDEA
CDEAB
DEABC
EASCD

ABCDE
BDECA
CEBAD
DCAEB
EADBC

51

52

53

ABCDE
BDAEC
CAEBD
DEBCA
ECDAB

ABCDE
BEDAC
CDBEA
DAECB
ECABD

ABCDE
BEACD
CADEB
DCEBA
EDBAC

54

55

56

93

Critical Values of 'r'

TABLE 4: CRITICAL VALUES OF SIMPLE CORRELATION COEFFICIENTS


O.F.
0.05
0.01
OF
0.05
0.01
1
2
3
4
5
6
7

0.997
0.950
0.878
0.811

1.000
0.990
0.959
0.917

50
52
54

0.273
0.268
0.263

0.354
0.348
0.341

56

0.259

0.336

0.755
0.707
0.777

0.875
0.834
0.798
0.765
0.735
0.708
0.684

58
60
62
64

0.245
0.250
0.246
0.242

66
68
70
72
74
76
78
80
82

0.239
0.235
0.232
0.229
0.260
0.223
0.220
0.217
0.215

0.330
0.325
0.320
0.315
0.310
0.306
0.302
0.298
0.294
0.290
0.286
0.283
0.280

15
16
17

0.632
0.602
0.576
0.553
0.532
0.514
0.497
0.482
0.468
0.456

18

0.444

0.661
0.641
0.623
0.606
0.590
0.575
0.561

84

0.212

0.276

19
20
21
22

0.433
0.423
0.413
0.404

0.549
0.537
0.526
0.515

86
88
90
92

0.210
0.207
0.205
0.203

0.273
0.270
0.267
0.264

23

0.396
0.388
0.381
0.374

0.505
0.496
0.487
0.479

94

24
25
26

0.201
0.199
0.197

0.262
0.259

27
28
31
32
33
34
35
36
37

0.367
0.361
0.344
0.339
0.334
0.329
0.325
0.320

0.471
0.463
0.442
0.436
0.430
0.424
0.418
0.413

96
98
100
105
110
115
130
135
140
145
150

0.316

0.408

38
39
40

0.312
0.308
0.304

41

0.301

42

0.297
0.294
0.291
0.288
0.185
0.282
0.279
0.276

8
9
10
11
12
13
14

43
44
45
46
47
48
49

0.195

0.256
0.254

0.190
0.186
0.182
0.171
0.168

0.248
0.242
0.237
0.223
0.219

0.165
0.162
0.159

0.215
0.210
0.208

160

0.154

0.202

0.403
0.398
0.393

170
180
190

0.150
0.145
0.142

0.196
0.190
0.185

0.389
0.384

200

0.138

0.181

250

0.380
0.376
0.372
0.368
0.365
0.361
0.358

300
350
400
450
500
600
700

0.124
0.113

0.162
0.148
0.128
0.137

94

0.105
0.098
0.092
0.088
0.080
0.074

0.1?l
0.115
0.105
0.097

5% 'F'Value

TABLE 5: THE F DISTRIBUTION 5% POINTS


Degrees of
freedom n2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
60
120
00

De rees of freedom, n1

1
2
3
161.40 199.50 215.70
18.51
19.00 19.16
10.13
9.55
9.28
7.71
6.94
6.59
6.61
5.79
5.41
5.99
5.14
4.76
5.59
4.74
4.35
5.32
4.46
4.07
5.12
4.26
3.86
4.96
4.10
3.71
4.84
3.98 3.590
4.75
3.88
3.49
4.67
3.80
3.41
4.60
3.74
3.34
4.54
3.68
3.29
4.49
3.24
3.63
4.45 3.590
3.20
4.41
3.55
3.16
4.38
3.52
3.13
4.35
3.10
3.49
4.32
3.47
3.07
4.30
3.44
3.05
4.28
3.42
3.03
3.01
4.26
3.40
4.24
3.38
2.99
2.98
4.22
3.37
2.96
4.21
3.35
4.20
3.34
2.95
4.18
3.33
2.93
4.17
3.32
2.92
4.03
3.23
2.84
4.00
3.16
2.76
3.92
3.07
2.68
3.84
2.99
2.60

224.60 230.20 234.00


19.25 19.30 19.33
9.12
9.01
8.94
6.16
6.39
6.26
5.19
5.05
4.95
4.53
4.39
4.28
4.12
3.97
3.87
3.84
3.69
3.58
3.63
3.37
3.48
3.22
3.48
3.33
3.36
3.20
3.09
3.26
3.11
3.00
3.18
2.92
3.02
3.11
2.96
2.85
3.06
2.90
2.79
3.01
2.85
2.74
2.70
2.96
2.81
2.93
2.77
2.66
2.74
2.90
2.63
2.87
2.71
2.60
2.84
2.68
2.57
2.82
2.6
2.55
2.80
2.64
2.53
2.51
2.78
2.62
2.76
2.60
2.49
2.47
2.74
2.59
2.57
2.46
2.73
2.71
2.56
2.44
2.70
2.54
2.43
2.69
2.53
2.42
2.61
2.45
2.34
2.37
2.25
2.52
2.45
2.29
2.17
2.37
2.21
2.09

95

00
8
12
24
238.90 243.90 249.00 254.30
19.37 19.41
19.45 19.50
8.84
8.74
8.64
8.53
6.04
5.91
5.77
5.63
4.82
4.68
4.53
4.36
4.15
4.00
3.84
3.67
3.73
3.57
3.41
3.23
3.44
3.12
2.93
3.28
3.23
2.71
3.07
2.90
2.74
3.07
2.91
2.54
2.95
2.79
2.61
2.40
2.85
2.69
2.50
2.30
2.77
2.60
2.42
2.21
2.70
2.53
2.35
2.13
2.07
2.64
2.48
2.29
2.59
2.42
2.24
2.01
2.19
2.55
2.38
1.96
2.51
2.34
2.15
1.92
2.48
2.31
2.11
1.88
2.45
2.28
2.08
1.84
2.42
2.25
2.05
1.81
2.40
2.23
2.03
1.78
2.38
2.20
2.00
1.76
2.36
2.18
1.98
1.73
1.71
2.34
2.16
1.96
2.32
2.15
1.95
1.69
2.30
2.13
1.93
1.67
2.29
2.12
1.91
1.65
2.28
2.10
1.90
1.64
2.27
2.09
1.62
1.89
2.18
1.79
1.51
2.00
2.10
1.92
1.70
1.39
2.02
1.83
1.61
1.25
194
1.75
1.52
1.00

1% 'F' Value

TABLE 6 : THE -DISTRIBUTION 1% POINTS


Degrees of
freedom n2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
60
120
00

1
4052
98.49
34.12
21.20
16.26
13.74
12.25
11.26
10.56
10.04
9.65
9.33
9.07
8.86
8.68
8.53
8.40
8.28
8.18
8.10
8.02
7.94
7.88
7.82
7.77
7.72
7.68
7.64
7.60
7.56
7.31
7.08
6.85
6.64

2
4999
9901
30.81
18.00
13.27
10.92
9.55
8.65
8.02
7.56
7.20
6.93
6.70
6.51
6.36
6.23
6.11
6.01
5.93
5.85
5.78
5.72
5.66
5.61
5.57
5.53
5.49
5.47
5.42
5.39
5.18
4.98
4.79
4.60

3
5403
99.17
29.46
16.69
12.06
9.78
8.45
7.59
6.99
6.55
6.22
5.95
5.74
5.56
5.42
5.29
5.18
5.09
5.01
4.94
4.87
4.82
4.76
4.72
4.68
4.64
4.60
4.57
4.54
4.51
4.31
4.13
3.95
3.78

De rees of freedom, n1
4
5
6
5625
5764
5859
99.25 99.30 99.33
28.71 28.24 27.91
15.98 15.52 15.21
11.39 10.97 10.67
9.15
8.47
8.75
7.85
7.46
7.19
7.01
6.63
6.37
6.42
6.06
5.80
5.64
5.99
5.39
5.67
5.32
5.07
5.41
5.06
4.82
5.20
4.86
4.62
4.69
4.46
5.03
4.32
4.56
4.89
4.44
4.20
4.77
4.67
4.34
4.10
4.25
4.01
4.58
4.50
4.17
3.94
4.10
4.43
3.87
4.37
4.04
3.81
4.31
3.76
3.99
3.71
4.26
3.94
4.22
3.90
3.67
4.18
3.86
6.63
4.14
3.82
3.59
4.11
3.78
3.56
3.75
4.07
3.53
3.73
3.50
4.04
3.47
4.02
3.70
3.51
3.29
3.83
3.12
3.34
3.65
3.17
2.96
3.48
3.32
3.02
2.80

8
5981
99.36
27.49
14.80
10.27
8.10
6.84
6.03
5.47
5.06
4.74
4.50
4.30
4.14
4.00
3.89
3.79
3.71
3.63
3.56
3.51
3.45
3.41
3.36
3.32
3.29
3.26
3.23
3.20
3.17
2.99
2.82
2.66
2.51

12
6106
99.42
27.05
14.37
9.89
7.72
6.47
5.67
5.11
4.71
5.40
4.16
3.96
3.80
3.67
3.55
3.45
3.37
3.30
3.23
3.17
3.12
3.07
3.03
2.99
2.96
2.93
2.90
2.87
2.84
2.66
2.50
2.34
2.18

24
6234
99.46
26.60
13.93
9.47
7.31
6.07
5.28
4.73
4.33
4.02
3.78
3.59
3.43
3.29
3.18
3.08
3.00
2.92
2.86
2.80
2.75
2.70
2.66
2.62
2.58
2.55
2.52
2.49
2.47
2.29
2.12
1.95
1.79

00

6366
99.50
26.12
13.46
9.02
6.88
5.65
4.86
4.31
3.91
3.60
3.36
3.16
3.00
2.87
2.75
2.65
2.57
2.49
2.42
2.36
2.31
2.26
2.21
2.17
2.13
2.10
2.06
2.03
2.01
1.80
1.60
1.38
1.00

',' Value In Sign Test

TABLE 7: CRITICAL VALUES OF "r' FOR THE SIGN TEST

1%

5%

1
2

46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80

4
5
6
7

8
9
10
11
12
13
14
15
16
17

18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

0
0
0
0
1
1
1
2
2
2
3

3
3
4
4
4
5
5
6
6
6
7
7
7
8
8
9
9
9
10
10
11
11
11
12
12
13
13

0
0
1
1
1

1
2
2
2
3
3
4
4
4
5
5
5

6
6
7
7
7
8
8
9
9
9
10
10
11
11
12
12
12
13
13
14
14
15
15

81

82
83
84
85
86
87
88
89
90

97

1%
13
14
14
15
15
15
16
16
17
17
17
18
18
19
19
20
20
20
21
21
22
22
22
23
23
24
24
25
25
25
26
26
27
27
28
28
28
29
29
30
30
31
31
31
32

5%
15
16
16
17
17
18
18
18
19
19
20
20
21
21

21
22
22
23
23
24
24
25
25
25
26
26
27
27
28
28
28
29
29
30
30
31
31
32
32
32
33
33
34
34
35

IT' Value in Signed Rank Test

TABLE 8 : CRITICAL VALUES OF' T' IN THE WILCOXON SIGNED RANK TEST

6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Level of Significance
for Two-Tailed Test
0.05
0.01

0
2

0
2
3
5
7
10
13
16
20
23
28
32
38
43
49
55
61
68

6
8
11
14
17

21
25
30
35
40
46
52
59
66
73
81
89

98

I,' Value In Run Test

TABLE 9: CRITICAL VALUES OF 'r' IN THE RUN TEST

n1
n2

'--'---

4 5 6

9 10 11 12 13 14 15 16 17 18 19 20

2
2
3
3
3

2
3
3
3

2
3
3

3
4
5
6
7
8

9
10
11
12
13
14
15
16
17
18
19
20

2
2
2
2
2
2
2
2
2

2
2
2
2
2
2
2
2
2
3
3
3
3
3
3

2
2
2
3
3
3
3
3
3
3
3
4
4

4
4
4

2
2
3
3
3
3
3
4
4

4
4
4
4
4

5
5
5

2
2
3
3
3
3
4

4
4
4
5
5
5
5
5
5

6
6

4
4

4
5
5
5
5
5

5
5
5
6

6
6

6
6

6
6
6

4
4
5
5
5

6
6

7
7
7
7
8
8
8

7
7
7
7

2
3
3
4
5
5
5
6
6
7
7
7
7
8
8
8
8
9

2
3
4

2
2
3
4

5
5

5
6

6
7
7
7
8
8
8
9
9
9
9

7
7
7
8
8
8

2
2
3
4
5
5
6
6
7
7
8
8

2
2
3

5
5

5
6

5
6

7
7
8
8

7
7
8
8

7
8
8

5
6
7
7

2
3
4
5
5
6
7
8

8
9
9

8
9
9

9
9
9

9
9

10
10
11
11
11
12
12
13

10
10
11
11
12
12
13
13

9
9
9

2
3
3

2
3
4

2
3
4

10
10 10
9
9 10 10 11
9 10 10 11
10 10 11 11
10 10 11 12
..

9
9
10
10
11
11
11
12
12

2
3
4
5
6
6
7
8

2
3
4
5
6
6
7
8

8
9

9
9

10
10
11
11
12
12
13
13
13

10
10
11
12
12
13
13
13
14

Notes: The values of r given In Tables 9 and 10 are various critical values of r associated with selected
values of n1 and n2. For the one-sample run test, any value of r that is equal to or less than the value shown in
Table 9 or equal to or greater than the value shown in Table ~ 0 is significant at the 5 percent level. For the twosample run test, any value of r that is equal to or less than the vqlue shown in Table 9 is significant at the 5
percent level.

99

'r' Value in Run Test

TABLE 10: CRITICAL VALUES OF 'r' IN THE RUN TEST

~ 2

n2

2
3
4
5
6

7
8
9
10
11
12
13
14
15
16

9
9 10
9 10
11
11

9
10
11
12
12
13
13
13
13

11
12
13
13
14
14
14
14
15
15
15

11
12
13
14
14
15
15
16
16
16
16

17

17
17

18
19
20

17
17
17

10 11

12 13 14 15 16 17 18 19 20

13
14
14
15
16
16
16
17
17
18
18
18
18
18
18

13
14
15
16
16
17
17
18
18
18
19
19
19
20
20

13
14 15 15 15
16 16 16 16
16 17 17 18
17 18 18 18
18 19 19 19
19 19 20 20
19 20 20 21
20 20 21 22
20 21 22 22
21 21 22 23
21 22 23 23
21 22 23 24
22 23 23 24
22 23 24 25

13
14
15
16
17

17
18
19
19
19
20
20
20
21
21

100

17

17
18
19
20
21
22
23
23
24
25

17

18
18
19
19
20
20
21
21
22
21
23
22
24
23
25
23
25
24
25 25 26
25 26 26
25 26 27

17
18
20
21
22
23
23
24
25
26
26
27
27

17
18
20
21
22
23
24
25
25
26
27
27
28

You might also like