You are on page 1of 69

# 1

## TERMS RELATED TO SAMPLING THEORY:

POPULATION:
Population or universe refers to the aggregate of
statistical information on a particular character of all the
members covered by an investigator/ enquiry.
POPULATION SIZE:
Refers to the total numbers of members in a certain
group. Population may be finite of infinite.
SAMPLING:
Refers to the selection of a sample from the population
with a view to ascertain the characteristics of the whole.
SAMPLE:
A sample is that part of the universe which we select for
the purpose of investigation.
CENSUS:
When each and every unit of the population is
investigated for the characteristics under study. 2
Economic Method:
This method is much more economical than complete census because
only a fraction of population is studied.
Save time as well as labour:
There is saving of time & labour not only in conducting the sample
enquiry but also in processing, editing & analyzing the data.
Testing of Accuracy:
The accuracy of the sampling investigation can be tested by comparing
the result of two or more samples.
Detailed & Intensive Enquiry:
The number of units under study is kept limited & this makes it possible
to study them in detail & intensively.
Reliability:
If sample are taken in proper size & on proper grounds, the results of
sampling will be almost the same which might have been obtained by
3
census method.
Misleading Results:
If a sample survey is not properly planned & carefully
executed, the results obtained may be unreliable &
misleading.
Need of Specialised Knowledge:
An efficient sampling requires the services of qualified,
skilled & experienced personnel.
Heterogeneous Units:
If the units of population are too heterogeneous, the
sampling techniques cant give fair results.
Impossibility of Sampling:
Sometimes the universe may be so small that it may be
impossible to draw a representative sample from it.
4
TERMS RELATED TO HYPOTHESIS TESTING:
Hypothesis:
Hypothesis is an assumption about a population
parameter.
Null Hypothesis:
Null Hypothesis is the assumption which we
wish to test and whose validity is tested for
possible rejection on the basis of sample
information.
It is denoted by Ho
Alternate Hypothesis:
Any hypothesis which is complementary to the
null hypothesis is called alternate hypothesis.
It is denoted by H1
5
Hypothesis Testing:
Hypothesis testing is a process of making a
decision on whether to accept or reject an
assumption about the population parameter on the
basis of sample information at a given level of
significance.

Level of significance:
Level of significance is the maximum probability of
rejecting the null hypothesis when it is true.
It is usually expressed as % and is denoted by
For example: 5% LOS implies that there are about
5 chance out of 100 of rejecting the null hypothesis
when it is true OR we are about 95% confident
that we will make a correct decision.
6
Critical Region / Rejection Region:
A region in the sample space in which if the computed
value of the test statistic lies, we reject the null
hypothesis, is called critical or rejection region.
Critical Value:
Critical value is that value of statistics which separates
the critical region from the acceptance region. It lies
at the boundary of the region of acceptance &
rejection.

Acceptance Region

## Critical Value Critical Value 7

One Tailed Test:
It is also known as one sided test. It is a test in
which rejection region is located in one side or in one
tail of the distribution of testing statistic.
On the basis of side or tail it maybe left tail or right
tail test.
For Eg: Testing a population with mean 50.

Acceptance Acceptance
Region Region

## Left Tail Test Right Tail Test

H0 : = 50 H0 : = 50
H1 : < 50 H1 : > 50 8
Two Tail Test:
It is also known as two sided test. It is a test in which
rejection region is located in both the tails of probability
curve of the sampling distribution.
For Eg: For testing a population mean 50
H0 : = 50
H1 : 50

Acceptance Region

## Critical Value Critical Value

9
Type I Error:
A type I error occurs when one rejects the null hypothesis
when it is true.
The probability of committing Type I error is denoted by
that is level of significance
Type II Error:
A type II error occurs when one accepts the null hypothesis
when it is false.
The probability of committing Type II error is denoted by .

## Statistical Decision Of the test

True
Situation
H0 is True H0 is False

## H0 is False Type II Error Correct Decision

10
Parameter:
Parameter is a statistical measure based on each & every
item of the population.
Parameter shows the characteristics of the population.
Statistic:
Statistic is a statistical measure based on items / observation
of a sample. Since the values of statistic varies from sample
to sample, it has sampling error and sampling fluctuations.
Notations Used for Parameter & Statistic:

## S. N. Characteristics Parameters Statistic

1 Size N n

2 Mean X

3 Standard Deviation S

11
The following are the steps involved in test of
significance.
STEP 1: Setting up of a null hypothesis:
This step involves setting of null and alternate
hypothesis. For eg: To test the population mean
50, null hypothesis & alternate hypothesis may
be formulated as:

Type of Test
(H0) (H1)

## b) One Tailed Test H0 : = 50 H0 : = 50

. (i) Right Tail H0 : = 50 H1 : > 50
. (ii) Left Tail H0 : = 50 H1 : < 50
12
STEP 2: Specify the test statistics to be used:
Test
S.No. Conditions
Statistics

## For the test of hypothesis involving

1 Z - Test
large sample size i.e. n > 30

## For the test of hypothesis involving

2 t - Test
small sample size i.e. n 30
For testing the discrepancy between
3 X2 - Test observed frequency and expected
frequency
For comparing several means at the
4 F Test
same time.
13
STEP 3: Compute the value of test statistic:
Compute value of test statistic (z,t, X2, f) used in testing.
STEP 4: Specifying level of significance:
In general, 5% and 1% level of significance is taken.
STEP 5: Finding critical value:
Find the critical value of the test statistic used at the
selected level of significance from the table of
respective sampling distribution (z, t,X2, f) .
STEP 6: Interpretation:
Specify the decision as follows:
If the critical value (table value) is greater than
computed value then we accept null hypothesis (Ho).

## If the critical value (table value) is less than

computed value then we reject null hypothesis (Ho).

14
15
Test of Significance of Large Sample
A sample is regarded as large only if its size
exceeds 30.
Step 1: Setting up Ho & H1 as follows:
Ho = & H1 (Two Tail Test)
Ho = & H1 > (Right Tail Test)
Ho = & H1 < (Left Tail Test)
Step 2: Appropriate test statistic to be used is Z
test , when the sample size exceeds 30 or
population SD is known.
Step 3: Calculate Standard Error of mean as follows
SE = (if population SD is Known)
n
X

## = S (if population SD is NOT Known)

n 16
Step 4: Calculate value of Z as follows:

-x
Z=
SE x
Where
is Sample Mean
x
is Population Mean
Step 5 : Look for critical value of z at a given level
of significance (5% or 1%) from the normal
distribution table for two tail or one tail test.

## LOS Two Tail Test One Tail Test

1% 2.58 2.33

5% 1.96 1.64

17
STEP 6: Interpretation:
Specify the decision as follows:
If the critical value (table value) is
greater than computed value then we
accept null hypothesis (Ho).

## If the critical value (table value) is

less than computed value then we
reject null hypothesis (Ho).

18
Illustration 1
A company manufacturing automobile tyres, finds that
tyre life is normally distributed with a mean of 40,000
km and standard deviation of 3,000 km. It is believed
that a change in the production process will result in a
better product and the company has developed a new
tyre. A sample of 100 new tyres has been selected. The
company has found that the mean life of these new
tyre is 40,900 km. Can it be concluded that the new
tyre is significantly better than the old one, using the
significance level of 0.01? 19
Illustration 2
An insurance agent has claimed that the
average age of policyholders who insure
through him is less than the average for all
other agents, which is 35 years. A random
sample of 40 policy holders who have
insured through him gave an average of 32
years with a standard error of 2 years.

## Using at 5% level of significance, ascertain

whether the insurance agents claim is
justifiable. 20
3. Philips company claims that the length of life of its
electric bulb is 2000 hours with the SD of 30 hours. A
random sample of 25 showed an average life of 1940
hours with a standard deviation of 25 hours. At 5% LOS,
can we conclude that the sample has come from
population with mean of 2000 hours.
4. A random sample of 900 members is found to have a
mean of 3.4 cm. Could it be reasonably regarded as a
simple sample from a large population whose mean is
3.25 cm and standard deviation 2.4 cm.
5. From a normally distributed infinite number of iron bars
with mean and standard deviation as 4m and 0.6m
respectively, a sample of 100 bars is taken. If the sample
mean is 4.2m, can the sample be called a truly random
sample? 21
Practical steps involved in two tailed test for difference
between the means of two samples:
Step1: Setting up the hypothesis as follows:
H0 : There is no significant difference between the
means of two samples i.e. 1 = 2
H1 : There is significant difference between the means of
two samples i.e. 1 2
Step2: Appropriate test should be Z when sample size
exceeds 30
Step3: Calculation of Standard Error:
Standard Error
Case
(SE)

## a) If samples of size n1 & n2 are drawn from

same population with standard deviation = (n + )
2 1
1
1
n2
1 2 2 2
b) If samples of size n1 & n2 are drawn from
two different population with standard = n1 + n2

=
deviation 1 &2 (i) 1 & 2 are known
S1 2 S2 2
( (ii) 1 &2 are unknown
n1 + n2 22
Step 4: Calculation of value of Z

X1 - X2
Z=
SE
Step 5 : Look for critical value of z at a given level
of significance (5% or 1%) from the normal
distribution table for two tail.
LOS 1% 5%
Two Tail Test 2.58 1.96
STEP 6: Interpretation:
Specify the decision as follows:
If the critical value (table value) is greater than
computed value then we accept null hypothesis (Ho).

## If the critical value (table value) is less than computed

value then we reject null hypothesis (Ho).
23
Illustrations
6. The mean yield of wheat from a district A was 210kg with
SD 10kg per acre from a sample of 100 plots. In another
district B, the mean yield was 220 kg with SD 12kg from a
sample of 150 plots. Assuming that the SD of the yield in
entire state was 11kg. Test whether there is any significant
difference between the mean yield of crops in the two
districts.
7. A random sample of 200 villages was taken for Allahabad
district and the average population per village was found
to be 485 with SD of 50. Another random sample of 200
villages from the same district gave an average population
of 510 per village with SD of 40. Is the difference
between averages of two sample statistically significant?
24
25
Test of Significance of Small Sample
A sample is regarded as small if sample size is less than or equals to 30.
Step 1: Setting up Ho & H1 as follows:
Ho = & H1 (Two Tail Test)
Ho = & H1 > (Right Tail Test)
Ho = & H1 < (Left Tail Test)
Step 2: Appropriate test statistic to be used is t test , when the
sample size is less than 30 and population SD is unknown.
Step 3: Calculate Standard Error of mean as follows
x2
SE =
X
;x=X-
n-1

=
n S2
n-1 ;S Sample SD
26
Step 4: Calculation of value of t as follow:
-x
t=
SE/ n
Step 5: Calculation degree of freedom:
= n -1
Step 6 : Look for critical value of t at a given level
of significance (5% or 1%) and calculated
degree of freedom
STEP 7: Interpretation:
Specify the decision as follows:
If the critical value (table value) is greater than
computed value then we accept null hypothesis
(Ho).
If the critical value (table value) is less than
computed value then we reject null hypothesis
(Ho).
27
Q1) A soap manufacturing company was distributing a particular
type of brand through a large number of retail shops. Before a
heavy advertising campaign, the mean sales per week per shop
was 140 dozens. After the campaign, a sample of 26 shops was
taken & the mean sales was found to 147 dozens with SD 16.
Can you consider advertisement effective? (2001 2002)

Q2) Six boys are selected at random from a school and their marks in
mathematics found to be 63,63,64,66,60 and 68 out of 100. In the
light of these marks discuss the general observation that the mean
marks in mathematics in the school were 66.

## Q3) The heights of ten children selected at random from a given

colony has a mean 63.5 cm & variance 6.25 cm. Test at 5% LOS,
the hypothesis that the children of the given colony are on average
less than 65 cm in all. ( The value of t for 9 dof at 5% LOS is 2.262)
28
29
Hpothesis Test Concerning The Difference Between
Two Population Mean
Step1: Setting up the hypothesis as follows:
H0 : There is no significant difference between the
means of two samples i.e. 1 = 2
H1 : There is significant difference between the
means of two samples i.e. 1 2
Step 2: Appropriate test statistic to be used is t
test , when the sample size is less than 30 or
population SD is unknown.
Step 3: Calculate Standard Error as follows:
x12 + x22
SE = n1 + n2 - 2
n1 S12 + n2 S22
=
n1 + n2 - 2 30
Step 4: Calculation of value of t as follow:

X1 X2
t= 1 1
SE +
n1 n2
Step 5: Calculation degree of freedom:
= n1 + n2 - 2
Step 6 : Look for critical value of t at a given level
of significance (5% or 1%)
Step 7: Interpretation:
Specify the decision as follows:
If the critical value (table value) is greater than
computed value then we accept null hypothesis
(Ho).
If the critical value (table value) is less than
computed value then we reject null hypothesis
(Ho).
31
ILLUSTRATIONS
Q1) Two groups of students appeared in a test
examination and the marks obtained by them were as
follows:
Ist Group 18 20 36 50 49 36 34 49 41

IInd Group 29 28 26 35 30 44 46 - -

## Examine the significance of difference between mean

marks secured by the above two groups.
Q2) Two salesmen A & B are employed by a company.
Recently, it conducted a sample survey yielding the
following data:
Salesman A Salesman B
No. of sales 20 22
Avg weekly sales (Rs in lakh) 30 25
Standard Deviation (Rs lakh) 10 7

## Is there any significant difference between the average

sales of the two salesmen? 32
33
34
Chi Square Test
Chi square test is a measurement which tells about
the magnitude of difference between actual of
observed frequencies (fo) and corresponding
theoretical frequencies (fe).
Mathematically, it is expressed as:

X = [ ]
2 (fo fe)2
fe

Properties:
It is a non - parametric test.
It is continuous probability distribution.
It is not symmetrical. It is skewed to the right.
It has only one parameter i.e. degree of freedom.
Its variance is 2 times d.o.f. (Variance = 2 d.o.f.)
Condition for application of X2-test:
Random Samples

Independent Observations

Atleast 50 Observations.

## Uses of Chi Square Test:

Test of Independence:
Chi square test is used to examine the association
or independence between two sets of attributes.
For Eg: To test whether there is any association
b/w the level of intelligence of fathers & sons or
both are independent.
Test of goodness of fit:
Chi square test is also used to determine whether
actual or observed frequencies correspond to any
specified theoretical frequency distribution such
as Binomial. Poisson & Normal Distribution.

Test of homogeneity
In this test it is determined whether two or more
independent random samples have been drawn
from the same population or not.
Steps Involved In Testing Independence of Attributes:
Step 1 : Set up the hypothesis as follows:
H0 : No association exists between the attributes.
H1 : An association exists between the attributes.
Step 2 : Calculate expected frequencies (fe):
Ri X Cj
(fe)ij = n
Where R is an ith row totat
C is a jth column total
n is sample size
Step 3 : Calculate value of Chi Square (X2):

X = [ ]
2 (fo fe)2
fe
38
fo fe fo - fe (fo - fe)2 (fo - fe)2/ fe

Total X2 =
Step 4: Calculation degree of freedom:
v = (r -1) (c -1)
Step 5 : Look for critical value of X2 at a given level
of significance (5% or 1%)
STEP 6: Interpretation:
Specify the decision as follows:
If the critical value (table value) is greater than
computed value then we accept null hypothesis
(Ho).
If the critical value (table value) is less than
computed value then we reject null hypothesis
(Ho). 39
Steps Involved In Testing
Goodness Of Fit:
Step 1 : Set up the hypothesis as follows:
H0 : fo = fe
H1 : fo fe
Step 2 : Calculate expected frequencies (fe):
Calculate expected frequencies using appropriate
theoretical distribution such as:
Binomial: Expected Frequency=NP(r) = N Cr p q
n r n-r

-m x
Poisson: Expected Frequency= NP(x) = N e m
x!
Sum of Frequencies
Normal : Expected Frequency=
No. of Observation
40
Step 3 : Calculate value of Chi Square (X2):

X 2
= [
(fo fe)2
fe ]
fo fe fo - fe (fo - fe)2 (fo - fe)2/ fe

Total X2 =
Step 4: Calculation degree of freedom:
v = n-1
Step 5 :Look for critical value of X2 at a given level of
significance (5% or 1%).
STEP 6: Interpretation: Specify the decision as follows:
If the critical value (table value) is greater than
computed value then we accept null hypothesis (Ho).
If the critical value (table value) is less than computed
value then we reject null hypothesis (Ho).
41
Q1) 100 students of management institute obtained
the following grades in the statistics paper.
Grade A B C D E Total
Frequency 15 17 30 22 16 100

## Using X2 test, examine the hypothesis that the

distribution of grade is uniform.
Q2) A survey of 320 families with 5 children each
revealed the following distribution:
No. of boys 5 4 3 2 1 0
No. of girls 0 1 2 3 4 5
No. of families 18 56 110 88 40 8

## Given that values of X2 at 5 d.o.f. are 11 & 15.1 at

0.05 & 0.01 LOS respectively. Test the hypothesis
that male & female births are equally probable. 42
Question 3
The employees in 4 different firms are
distributed in three skill categories shown in
the following table. Test the hypothesis that
there is no relationship between the firm and
the type of labour. Let the level of significance
be 5%.

Firm
A B C D
Type of labour
Skilled 24 24 23 49
Semi Skilled 32 60 37 51
Manual 24 56 40 80

43
Question 4
Five coins were tossed 3200 times and the
number of heads appearing each time is
noted as shown below:

No. of heads 0 1 2 3 4 5
Frequency 80 570 1100 900 500 50

## Using Chi square test, examine the hypothesis

that all the coins are unbiased.

44
Question 5
The following table shows the number of
road accidents in a city, that occurred
during various days of a week:
Days Sun Mon Tue Wed Thu Fri Sat
No. of accidents 14 16 8 12 11 9 14

## Using chi square test, determine whether the

accidents are normally distributed over
different days of the week.

45
Question 6
By using Chi square test, find out
whether there is any association
between income level and type of
schooling:

Public Govt.
Income
School School

## High 162 438

46
47
48
The Analysis of
F - Test
Variance or F Testis a technique
develop by R A Fisher to test for the significance of the
difference among more than two sample means and to
make inferences about whether such samples are
drawn from the population having the same mean.
F test is based on the ratio rather than the difference
between variance.
F test is obtained by taking ration of unbiased
estimates of population variances as follows:
n1S12
var12 (n1 1)
f= var22 = n2S22
(n2 1)
n1 sample size of first population.
n2 sample size of second population.
S1 Standard deviation of first population.
S2 Standard deviation of second population. 49
Assumptions:
Samples are randomly drawn from the
population

## Samples are drawn from normally

distributed population.

## Population from which the samples are

drawn have same means & variances.

50
CLASSIFICATION MODEL:
(a) One Way Model
(b) Two Way Model

Salesman
One Factor
Salesman

EXPERIMENT
Sales

Two Factors
Season
51
Practical Steps Involved In Preparation Of ANOVA
Table For One Factor Analysis Of Varince
STEP 1:
We set null hypothesis and alternate hypothesis as:
Ho : 1 = 2 = 3 = 4 .. = n
H1 : Atleast two means are not equal
STEP 2:
Calculate the sum of observations of each sample i.e.
X1, X2, X3,, Xn.
Now square the observations and obtain their total
for each sample i.e. X12, X22, X32, . Xn2.
STEP 3:
Calculate Correction Factor (T2/N) as follows:
T2 (Sum of all observation of all samples)2
N = Total no. of observation of all the samples
52
T2 = (X1 + X2 + X3 + + Xn)2
N N
STEP 4:
Calculate Sum of Square between samples(SSB) as
follows:
SSB = (Sum of Observation of sample)12+ .. C.F.
(No. of items in column)

+ ..... +
(X1)2 (X2)2 (X3)2 (Xn)2 T2
SSB = n1 + n2 + n3 nn -
N
STEP 5:
Calculate Total Sum of Square (SST) as follows:

## SST = (X12 + X22 + X32 + . + Xn2) C.F

53
STEP 6:
Calculate Sum of Square within sample (SSW) as
follows:
SSW = SST SSB
STEP 7:
Preparation of ANOVA Table:
Sources Of Sum Of d.o.f Mean Variance
Variation Squares Squares Ratio

Between SSB c -1
Samples
MSB = SSB
c-1
F = MSB
Within SSW N-c MSW = SSW
MSW
Samples N-c
Total SST N1
54
Step 8:
Calculate critical value at given level of
significance.

Step 9:

Interpretation:
Compare the computed value of F with the table
value of F for the given level of significance and
interpret the same as follows:
Case (a) :
If Critical Value > Calculated Value; Accept null
hypothesis
Case (b) :
If Critical Value < Calculated Value; Reject null
hypothesis 55
56
57
Practical Steps Involved In Preparation Of
ANOVA Table For Two Way Model
STEP 1:
We set null hypothesis and alternate hypothesis as:
Ho : 1 = 2 = 3 = 4 .. = n
H1 : Atleast two means are not equal
STEP 2:
Calculate the sum of observations of each row and
each column.
Now square the observations and obtain their total
for each sample.
STEP 3:
Calculate Correction Factor (T2/N) as follows:
T2 (Sum of all observation of all samples)2
N = Total no. of observation of all the samples
58
T2 = (X1 + X2 + X3 + + Xn)2
N N
STEP 4:
Calculate Sum of Square between columns (SSC) as:
(Sum of Square of total of each columns)12+ .. C.F.
SSC=
(No. of items in column)

+ ..... +
(X1)2 (X2)2 (X3)2 (Xn)2 T2
SSC = n1 + n2 + n3 nn - N
STEP 5:
Calculate Sum of Square between rows (SSR) as:
SSR= (Sum of Square of total of each row)12 + .. C.F.
(No. of items in row)

+ ..... +
(Y1)2 (Y2)2 (Y3)2 (Yn)2 T2
SSR = r1 + r2 + r3 rn - N
59
STEP 6:
Calculate Total Sum of Square (SST) as
follows:
SST = Sum of square of all observation C.F.

## SST = (X12 + X22 + X32 + . + Xn2) C.F

STEP 7:
Calculate Total Sum of Square for the Residual
Error (SSE) as:

## SSE = SST (SSC + SSR)

60
STEP 8:
Preparation of ANOVA Table:
Sources
Sum of Degree of Variance Critical
of Mean Squares
Squares Freedom Ratio Value
Variation

MSC
Between
Columns
SSC c-1 MSC = SSC
c-1 F1= MSE F1

Between MSR
Rows
SSR r-1 MSR = SSR
r-1
F 2= MSE F2

Residual
SSE (c-1)(r-1) MSE =
SSE
Error (r-1)(c-1)

Total SST rc -1

61
Step 9:
Interpretation:
Compare the computed value of F with the table
value of F for the given level of significance and
interpret the same as follows:

Case (a) :

hypothesis

Case (b) :

## If Critical Value < Calculated Value; Reject null

hypothesis

62
Illustration
A tea company appoints four salesman A, B, C
& D and observes their sales performance in
three seasons of the year viz Summer,
Monsoon & Winter. The figures of sales in lakh
of Rs, are given in the following table. Carry
out an analysis of variance and interpret.
Salesman
A B C D
Season
Summer 56 56 41 55
Monsoon 46 48 49 49
Winter 48 49 51 52
Total 150 153 141 156 63
Illustrations
Q1) Three varieties A, B, and C of crops are tested
in a randomised block design with four
replications. The plot yields in kgs. Are given in
following table. Analyse the experimental yield
and state your conclusions at 5 % level of
significance.

A6 C5 A8 B9
C8 A4 B6 C9
B6 B7 C 10 A6

## The table value of F is as follows:

For d.f.(2,6) 5.143 and (3,6) 4.757
64
65
Remaining Questions
Of
Probability

66
Illustration
The probability that a boy will solve the
problem is 3/4 & that a girl will solve the
problem is 4/5. Find the probability that

## d) Atleast one of them will solve the problem

67
Illustration
A bag contains 3 black, 4 white and 5 red balls. One
ball is drawn at random. Find the probability that :
a) it is either a black ball or non white ball
b) it is either a white ball or non red ball
c) it is either a red ball or non black ball

Illustration
A dice is thrown. What is the probability of getting : -
a) a multiple of 2 or 3
b) A multiple of 2 or 4

68
Illustration
X can solve 80 percent of the problem given in a book and
Y can solve 60 percent. What is the probability that:-
a) both will solve problem
b) none will be able to solve a problem
c) problem will be solved
d) atleast one them will not be able to solve the problem
e) only one of them will solve the problem.

Illustration
X can solve 3 problems out of 5, Y can solve the 2 out of 5
and Z can solve 3 out of 4. What is the probability that : -
a) the problem will be solved
b) only two of them will be solve a problem
c) atleast two of them will solve a problem

69