You are on page 1of 67

MEASURES OF CENTRAL TENDENCY

UN GROUPED DATA GROUPED DATA


n n

x i
f x i i
x i 1
x i 1 n

n f
i 1
i

fi are frequencies
xi are mid values

UN GROUPED DATA GROUPED DATA


Step1: Arrange the given data in the Ascending order
N
Step2: Count the given number of observations. m
Step3:
median l 2 h
Case I : If the number of observations is odd
f
n 1
th

median element in the Ascending order


2 N = total frequency
Case II: If the number of observations is even l = lower bound of median class
th th
n n m = frequency until the median class
element 1 element f = frequency of the median class
median 2
2
in the Ascending order h = width of the class interval
2

UN GROUPED DATA GROUPED DATA


f f0
mode l h
2f - f 0 f1
An observation which repeats more number of
f = maximum frequency
times is called the Modal value
f0 = frequency above the maximum frequency
f1 = frequency below the maximum frequency
l = lower bound where the maximum frequency lies
h = width of the class interval
UN GROUPED DATA GROUPED DATA

n n
log xi f i log xi
G.M Anti log i1 G.M Anti log i1
n N


fi are frequencies
xi are mid values

UN GROUPED DATA GROUPED DATA

n N
H.M H.M n
fi

n
1
i 1 xi i 1 x i

fi are frequencies
xi are mid values
MEASURES OF DISPERSION

UN GROUPED DATA GROUPED DATA


Range= Max. Value Min. Value Range = Biggest upper bound Smallest Lower bound

UN GROUPED DATA GROUPED DATA


Step1: Arrange the given data in the Ascending order
3N
Step2: Count the given number of observations. m3
Step3:
Q3 l3 4 h
Case I : If the number of observations is odd
f3
3(n 1)
th

Q3
4
element in Ascending order
N = total frequency
(n 1)
th
l3 = lower bound of the (3N/4) class
Q1 element in Ascending order
4 m3 = frequency until the median class
Case II: If the number of observations is even f3 = frequency of the median class
th h = width of the class interval
3n
Q3 element in Ascending order
4
N
n
th

Q1 element in Ascending order


m1
4 Q1 l1 4 h
Step4: f1


N = total frequency
Q Q1
Q.D 3 l1 = lower bound of the (N/4) class
2 m1 = frequency until the median class
f1 = frequency of the median class
h = width of the class interval

Q Q1
Q.D 3
2
UN GROUPED DATA GROUPED DATA
n n
n n f i (x i x ) f x
(x x
i i

i x) i M.D i 1
, where x i 1
N N
M.D i 1
, Where x i 1

n n fi are frequencies
xi are mid values
N total frequency

UN GROUPED DATA GROUPED DATA


n
n

(x i x) 2
n

xi f (x x) i i
2
n

f x
S.D i 1 i i

S.D i 1
,Where x i 1
N , where x i 1

n n N

fi are frequencies
xi are mid values
N total frequency

UN GROUPED DATA GROUPED DATA

n n

x i
f x i i
mean i 1
mean i 1
N
n
n

(x
n

i x) 2 f (x x) i i
2

S.D i 1 S.D i 1

n N

s.d s.d
C.V C.V 100
100 mean
mean
TESTING OF HYPOTHESIS

Population: Any set of objects under study in a statistical


investigation is called population.
Eg:- If we want to study the average life time of the electric bulbs,
here the term population is the number of electric bulbs.

Sample: A finite subset of the population is known as sample and the


number of elements in that sample is known as sample size.

Sampling error: The error involved in approximation about the


population characteristics on the basis of the sample is known as
sampling error and it is inherent and unavoidable in any sample
scheme.

Parameter: Any statistical constant related to the population is


known as parameter.
Eg:- i) Population mean ii) Population Variance 2

Statistic: Any statistical constant related to the sample is known as


statistic
Eg:- i) Sample mean x ii) Sample variance s2

Tests of significance: A very important aspect of the sampling theory


is the study of the tests of significance which enables us to decide on
the basis of the sample whether
i) the deviation between the observed sample statistic and
the hypothetical parameter value is very less.
ii) the deviation between two sample statistics is significant
or not.

Null hypothesis: For applying the test of significance, we first set up


a hypothesis i.e a definite statement about the population parameter.
Such a hypothesis which is usually a hypothesis of no difference is
called Null hypothesis.
According to R.A.Fisher Null hypothesis is the hypothesis
which is tested for all possible rejection under the assumption that it
is true. It is usually denoted by H0.
Alternative hypothesis: Any hypothesis which is complimentary to
the Null hypothesis is called an alternative hypothesis. It is usually
denoted by H1.

Degrees of freedom: Degrees of freedom is the number of


independent observations in a set. In a test of hypothesis a sample is
drawn from the population of which the parameter is under a test. The
size of the sample varies since it depends either on the experimenter
or on the resources available. Moreover the test statistic involves the
estimated value of the parameter which depends on the number of
observations. Hence the sample size plays an important role in testing
of hypothesis and is taken care of by degrees of freedom.

Distinguish between type I and Type II errors in testing the


hypothesis.
Decision from Sample
Reject H0 Accept H0

Wrong Correct
H0 True (Type I error)

H0 False Correct Wrong


(Type II error)

Type I error: Probability of rejecting H0 when H0 is true.


Type II error: Probability accepting H0 when H0 is false.

Level of significance() The probability of type I error is known as


level of significance of the test. It is also called the size of the
critical region.

Point estimation: If from the sample information available, we can


find a statistic which is almost equal to the true value of the parameter.
The method of finding such a statistic is known as point estimation.
Eg:- sample mean is a point estimate of population mean.

Interval estimation: If from the sample information available, we can


find an interval in which a parameter is expected to lie with a particular
probability, the method of finding out such an interval is called interval
estimation.
Eg:- The 95% confident interval for the population mean is

x 1.96 , x 1.96
n n

Procedure for testing the hypothesis.


Step 1: Null hypothesis: Set up the null hypothesis H0.
Step 2: Alternative hypothesis: Set up the alternative hypothesis H1.
Step 3: Level of significance: Choose the appropriate level of
significance which is fixed in advance.
Step 4: Test statistic: Compute the test statistic under H0.
Step 5: Inference: We compare the computed value with the table
value/ critical value. If the obsolete computed value is less than
or equal to the table value/critical value, we accept H0.
Otherwise we reject

Critical values:

Critical values Z Level of significance ()


1% 5% 10%
Two-tailed test 2.58 1.96 1.645
Right-tailed test 2.33 1.645 1.28
Left-tailed test -2.33 -1.645 -1.28

t-test
Test for Single mean:

H0: = 0
H1: 0 (or) > 0 (or) < 0
[Any one of these 3 conditions depends on the given problem]
Level of significance:
Appropriate level of significance is % (given/chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by

x
t ~ t n1 at % level of significan ce
s n
Inference:
1) If it is a two tailed test (), Reject H0 if tcal t/2,n-1. Otherwise
accept H0.
2) If it is a right tailed test (>), Reject H0 if tcal t,n-1. Otherwise
accept H0.
3) If it is a left tailed test (<), Reject H0 if tcal -t,n-1. Otherwise
accept H0.

Example:- Ten individuals are chosen from a normal population


and their heights (in inches) are given below. Test whether the
sample comes from a normal population whose mean height is 66
inches or not at 5% level of significance?
63,63,66,67,68,69,70,70,71,71.
Sol:-
H0: = 66
H1: 66
Level of significance:
Appropriate level of significance is 5% (given)
Test Statistic:
To test the above hypotheses the test statistic is given by

x
t ~ t n1 at % level of significan ce
s n
x
x i
67.8
n

s
(x i x) 2
3.0111
n 1

Under H0, the test statistic is given by

67.8 66
t 1.89 ~ t 9 at 5% level of significan ce
3.0111 10
Inference:
The tabulated value of t at 5% level of significance for 9
degrees of freedom in a two tailed test is 2.262.
[t/2,n-1=t0.05/2,10-1=t0.025,9=2.262]
Here, tcal < t/2,n-1 . So, we accept H0. Hence we conclude that
the sample comes from the population whose mean value is 66inches

Test for equality of Two means (n1n2) :


H0: x = y
H1: x y (or) x > y (or) x < y
[Any one of these 3 conditions depends on the given problem]
Level of significance:
Appropriate level of significance is % (given/chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by

(x - y) ( x y )
t ~ t n1 n 2 1 at % level of significan ce
1 1
S
n1 n 2
Inference:
1) In a two tailed test (), Reject H0 if t cal t /2,n1 n 2 2 .
Otherwise accept H0.
2) In a right tailed test (>), Reject H0 if .
t cal t , n1 n 2 2
Otherwise accept H0.
3) In a left tailed test (<), Reject H0 if t cal t , n1 n 2 2
Otherwise accept H0.

Example:- The following random samples are measurements of


heat producing capacity in millions of calories per ton of specimens
of coal from two mines.
Mine 1 8260 8130 8350 8070 8340
Mine 2 7950 7890 7900 8140 7920 7840
Use 0.01 level of significance to test whether the difference
between the means of these two samples is significant?
Sol:-
H0: x = y
H1: x y
Level of significance:
Appropriate level of significance is 1% (given)
Test Statistic:
To test the above hypotheses the test statistic is given by

(x - y) ( x y )
t ~ t n1 n 2 1 at % level of significan ce
1 1
S
n1 n 2

x
x i
8230 y
y i
7940
n1 n2

S
(x i x) 2 (y i y) 2
114.31
n1 n2 2
Under H0, the test statistic is given by

(8230 7940) - (0)


t 4.19 ~ t 9 at 1% level of significan ce
1 1
114.31
5 6
Inference:
The tabulated value of t at 1% level of significance for 9
degrees of freedom in a two tailed test is 3.250.
[t/2,n1+n2-2=t0.01/2,9=t0.005,9=3.250]
Here, tcal > ttab . So, we reject H0.

Example:- The heights of 6 randomly chosen sailors are 63,65,68,69,71,72


inches. Those of 10 randomly chosen soldiers are
61,62,65,66,69,69,70,71,72,73 inches. Discuss whether this data gives a
suggestion that the sailors are taller than soldiers.
Sol:-
H0: x = y
H1: x > y
Level of significance:
Appropriate level of significance is 5% (chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by

(x - y) ( x y )
t ~ t n1 n 2 1 at % level of significan ce
1 1
S
n1 n 2

x
x i
68, y
y i
67.8
n1 n2

S
(x i x) 2 (y i y) 2

60 153.6
3.906
n1 n2 2 14

Under H0, the test statistic is given by

(68 67.8) - (0)


t 0.099 ~ t14 at 5% level of significan ce
1 1
3.906
6 10
Inference:
The tabulated value of t at 5% level of significance for 14
degrees of freedom in a right tailed test is 1.761.
[t,n1+n2-2=t0.05,14=t0.05,14=1.761]
Here, tcal < t,n-1 . So, we accept H0. Hence we conclude that
there is no significant difference between the mean heights of the
sailors and soldiers.
Test for equality of Two means (n1= n2) (Paired t-test)
H0: d = 0
H1: d o (or) d > 0 (or) d < 0
[Any one of these 3 conditions depends on the given problem]
Level of significance:
Appropriate level of significance is % (given/chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by

d d
t ~ t n1 at % level of significan ce
sd n

Inference:
1) If it is a two tailed test (), Reject H0 if tcal t/2,n-1. Otherwise
accept H0.
2) If it is a right tailed test (>), Reject H0 if tcal t,n-1. Otherwise
accept H0.
3) If it is a left tailed test (<), Reject H0 if tcal -t,n-1. Otherwise
accept H0.

Problem on paired t-test


1) The following are the average weekly loses of working hours due
to accidents in 10 industrial plants before and after a certain safety
program was put into operation. Use the 5% level of significance
to test whether the safety program is effective
Before 45 73 46 124 33 57 83 34 26 17
After 36 60 44 119 35 51 77 29 24 11
Sol:-
H0: d = 0(There is no difference between in the average no.of
accidents before and after the safety program is put into operation)
H1: d > 0 (Training program is effective i.e the average number of
accidents decreased after the safety program is implemented)
Level of significance:
Appropriate level of significance is 5% (given)
Test Statistic:
To test the above hypotheses the test statistic is given by
d d
t ~ t n1 at % level of significan ce
sd n

di 9 13 2 5 -2 6 6 5 2 6

d
d i
5.2
n

sd
(d i d) 2
4.08
n 1

5.2 - 0
t 4.030 ~ t 9 at 5% level of significan ce
4.08 10
Inference:
The tabulated value of t at 5% level of significance for 9
degrees of freedom in a right tailed test is 1.833
[t,n-1=t0.05,10-1=t0.05,9=1.833]
Here, tcal > t,n-1 . So, we reject H0. Hence we conclude that
The safety program is effective.

Test for Single Variance/ Single S.D:


H0: 2=02
H1: 202 (or) 2>02 (or) 2<02
[Any one of these 3 conditions depends on the given problem]
Level of significance:
Appropriate level of significance is % (given/chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by

2

(x i - x) 2
~ 2 n-1 at % level of significan ce
2

(n - 1)s 2
2
~ 2 n-1 at % level of significan ce
2

Inference:
1) If it is a two tailed test (), Reject H0 if 2cal 2/2,n-1.
Otherwise accept H0.
2) If it is a right tailed test (>), Reject H0 if 2cal 2,n-1.
Otherwise accept H0.
3) If it is a left tailed test (<), Reject H0 if 2cal 21-,n-1.
Otherwise accept H0.

Problems related to test for single Variance.


1. .If 12 observations of the specific heat of iron have a standard deviation
of 0.0086, test the null hypothesis that =0.01 for such observations.
Use the alternative hypothesis 0.01 and level of significance 0.01.
Sol:-
H0: =0.01
H1: 0.01
Level of significance:
Appropriate level of significance is 1% (given)
Test Statistic:
To test the above hypotheses the test statistic is given by

(n - 1)s 2

2
~ 2 n-1 at % level of significan ce
2

Under H0, the test statistic is given by


12 * 0.0086 2

2
8.1356
0.012

Inference:
The table value of 2 at 1% level of significance for 11 d.f in a
two tailed test is 26.757
2/2,n-1= 20.005,11=26.757
Here, 2cal < 2tab . So, we accept H0.

2. . A random sample of 15 observations form a population are


14,15,13,21,14,12,15,16,18,20,22,24,14,12,10. Test whether the
population variance d 7.5 at 5% level of significance.
Sol:-
H0: 2=7.5
H1: 27.5
Level of significance:
Appropriate level of significance is 5% (given)
Test Statistic:
To test the above hypotheses the test statistic is given by

2

(x i - x) 2
~ 2 n-1 at % level of significan ce
2

Given x 16, (x i x ) 2 236

Under H0, the test statistic is given by


236
2 31.4667
7.5
Inference:
The table value of 2 at 5% level of significance for 14 d.f in a
two tailed test is 26.119
2/2,n-1= 20.025,14=26.119
Here, 2cal > 2tab . So, we reject H0.

Test for Two Variances/ Two S.Ds: F-Test


H0: x2=y2
H1: x2 y2 (or) x2 > y2 (or) x2 < y2
[Any one of these 3 conditions depends on the given problem]
Level of significance:
Appropriate level of significance is % (given/chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by
S2x
F 2 ~ Fn1-1,n2-1 at % level of significan ce
Sy
Inference:
1) If it is a two tailed test (), Reject H0 if Fcal F,n1-1,n2-1.
Otherwise accept H0.
2) For both right tailed and left tailed tests (>, <), Reject H 0 if
Fcal F,1-,n1-1,n2-1. Otherwise accept H0.

Note:-
Fcal1 i.e the calculated value of F must be always greater
than or equal to !. If not the test statistic becomes 1/F.

1). Random samples from two normal populations are given below
Sample 1 16 26 27 23 24 22
Sample 2 33 42 35 32 28 31

Do the population variances differ significantly?


Sol:-
H0: x2=y2
H1: x2 y2
Level of significance:
Appropriate level of significance is 5% (chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by

S2x
F 2 ~ Fn1-1,n2-1 at % level of significan ce
Sy

x 23, S 2

(x i x) 2
15.2
n1 1
x

y 33.5, S 2

(y i y) 2
22.7
n2 1
y

15.2
F 0.6696 1 ~ F5,5 at 5% level of significan ce
22.7
22.7
F 1.4934
15.2
Inference:
The table value of F at 5% L.O.S for (5,5) d.f is 5.05.
F,n2-1,n1-1=F0.05,5,5=5.05
Fcal < Ftab, We accept H0.

2).Two independent random samples of 8 and 7 items respectively have


the following values
Sample 1 9 11 13 11 15 9 12 14
Sample 2 10 12 10 14 9 8 10

Test whether the difference between the variances is


significant at 1% level of significance?
Sol:-
H0: x2=y2
H1: x2 y2
Level of significance:
Appropriate level of significance is 1% (given)
Test Statistic:
To test the above hypotheses the test statistic is given by

S2x
F 2 ~ Fn1-1,n2-1 at % level of significan ce
Sy

x 11.75, S 2

(x i x) 2
4.786
n1 1
x

y 10.43, S 2

(y i y) 2
3.952
n2 1
y

4.786
F 1.21 1 ~ F7,6 at 1% level of significan ce
3.952
Inference:
The table value of F at 1% L.O.S for (7,6) d.f is 8.26.
F,n1-1,n2-1=F0.05,7,6=8.26
Fcal < Ftab, We accept H0.

Large Sample Tests(n>30)

Critical values Z Level of significance ()


1% 5% 10%
Two-tailed test 2.58 1.96 1.645
Right-tailed test 2.33 1.645 1.28
Left-tailed test -2.33 -1.645 -1.28

Test for Single mean:


H0: = 0
H1: 0 (or) > 0 (or) < 0
[Any one of these 3 conditions depends on the given problem]
Level of significance:
Appropriate level of significance is % (given/chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by

x
z ~ N(0,1) at % level of significan ce
n
Inference:
1) If it is a two tailed test (), Reject H0 if Zcal Zcritical value.
Otherwise accept H0.
2) If it is a right tailed test (>), Reject H0 if Zcal Zcritical value.
Otherwise accept H0.
3) If it is a left tailed test (<), Reject H0 if Zcal Zcritical value.
Otherwise accept H0.

1) A trucking firm is suspicious of the claim that the average life time
of certain tyres is at least 28,000 miles. To check this claim the firm
puts 40 of these tyres on its trucks and gets a mean life time of
27,468 miles with a standard deviation of 1,348 miles. What can
we conclude if the probability of type I error is to be
at most 0.01?
Sol:-
H0: 28,000 miles
H1: < 28,000 miles
Level of significance:
Appropriate level of significance is 1% (given)
Test Statistic:
To test the above hypotheses the test statistic is given by

x
z ~ N(0,1) at % level of significan ce
n
Under H0, the test statistic is given by
27468 - 28000
z 2.52
1348 40
Inference:
The critical value of Z at 1% level of significance in a two
tailed test is -2.33
Here, Zcal < Zcritical value. So, we reject H0.

2) A sample of 400 individuals is found to have a mean height of


67.47 inches. Is it reasonable to regard the sample drawn from the
large population with mean height 67.39 inches and standard
deviation of 1.3 inches. Test at 1% level of significance.
Sol:-
H0: = 67.39
H1: 67.39
Level of significance:
Appropriate level of significance is 5% (given)
Test Statistic:
To test the above hypotheses the test statistic is given by

x
z ~ N(0,1) at % level of significan ce
n

Under H0, the test statistic is given by


67.47 - 67.39
z 1.23
1.3 400
Inference:
The critical value of Z at 1% level of significance in a two
tailed test is 1.645. Here, Zcal < Zcritical value. So, we accept H0.

3) In 64 randomly selected hours of production , the mean and


standard deviation of the number of acceptable pieces produced by
automatic stamping machine are x 1038, s 146 . At 0.05 level of
significance does this enable us to reject the null hypothesis
=1000 against the alternative hypothesis >1000.
Sol:-
H0: = 1000
H1: > 1000
Level of significance:
Appropriate level of significance is 5% (given)
Test Statistic:
To test the above hypotheses the test statistic is given by

x
z ~ N(0,1) at % level of significan ce
n

Under H0, the test statistic is given by


1038 - 1000
z 2.08
146 64
Inference:
The critical value of Z at 5% level of significance in a two
tailed test is 1.645
Here, Zcal > Zcritical value. So, we reject H0.

Test for equality of Two means :


H0: x = y
H1: x y (or) x > y (or) x < y
[Any one of these 3 conditions depends on the given problem]
Level of significance:
Appropriate level of significance is % (given/chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by

(x - y) ( x y )
Z ~ N(0,1) at % level of significan ce
2
x
2

y
n1 n2
Inference:
1) If it is a two tailed test (), Reject H0 if Zcal Zcritical value.
Otherwise accept H0.
2) If it is a right tailed test (>), Reject H0 if Zcal Zcritical value.
Otherwise accept H0.
3) If it is a left tailed test (<), Reject H0 if Zcal Zcritical value.
Otherwise accept H0.

Problems related to test for equality of two means.

1) A company claims that its light bulbs are superior to those of its
main competitor. If a study showed that n1=40 bulbs has a mean
life time of 647 hours with a S.D of 27 hours. While a sample of
n2=40 mean lifetime of 638hours with a S.D 31 hours, does this
substantiate the claim at 0.05 level of significance.?
H0: x = y
H1: x > y
Level of significance:
Appropriate level of significance is 5% (given)
Test Statistic:
To test the above hypotheses the test statistic is given by

(x - y) ( x y )
Z ~ N(0,1) at % level of significan ce
2
x 2
y
n1 n2
Under H0, the test statistic becomes,
(647 - 638) (0)
Z 1.3846
2 2
27 31

40 40
Inference:
The critical value of Z at 5% level of significance in a two
tailed test is 1.645
Here, Zcal < Zcritical value. So, we accept H0. Hence we conclude
that there no significant differ between the mean life of two bulbs.
2) A college conducts both day and night classes intended to be
equally effective. A sample of 100 day-students yields examination
results as x1 72.4, S1 14.8 . A sample of 200 night-students
yields examination results as x 2 73.9, S2 17.9 . Are two means
statistically equal at 10% significance level?
H0: x = y
H1: x y
Level of significance:
Appropriate level of significance is 10% (given)
Test Statistic:
To test the above hypotheses the test statistic is given by

(x - y) ( x y )
Z ~ N(0,1) at % level of significan ce
2
x
2

y
n1 n2
Under H0, the test statistic becomes,
(72.4 - 73.9) (0)
Z 0.7702
14.82 17.9 2

100 200
Inference:
The critical value of Z at 10% level of significance in a two
tailed test is 1.645
Here, Zcal < Zcritical value. So, we accept H0. Hence we conclude
that there no significant differ between the means.
3) The mean yield of two sets and their variability are given below.
Test whether the difference in the mean yields of two sets is
significant at 5% level of significance.
Set I Set II
Mean yield 1258 kgs 1243 Kgs
S.D/plot 34 Kgs 28 Kgs
No. of Plots 40 60

H0: x = y
H1: x y
Level of significance:
Appropriate level of significance is 1% (given)
Test Statistic:
To test the above hypotheses the test statistic is given by

(x - y) ( x y )
Z ~ N(0,1) at % level of significan ce
2
x
2

y
n1 n2
Under H0, the test statistic becomes,
(1258 - 1243) (0)
Z 2.3154
34 2 282

40 60
Inference:
The critical value of Z at 5% level of significance in a two
tailed test is 1.96
Here, Zcal > Zcritical value. So, we reject H0. Hence we conclude
that the mean yield of the two sets are not equal.

Test for Single Variance/ Single S.D:


H0: 2 = 02
H1: 2 02 (or) 2 > 02 (or) 2 < 02
[Any one of these 3 conditions depends on the given problem]
Level of significance:
Appropriate level of significance is % (given/chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by
s -
z ~ N(0,1) at % level of significan ce
2
2n
Inference:
1) If it is a two tailed test (), Reject H0 if Zcal Zcritical value.
Otherwise accept H0.
2) If it is a right tailed test (>), Reject H0 if Zcal Zcritical value.
Otherwise accept H0.
3) If it is a left tailed test (<), Reject H0 if Zcal Zcritical value.
Otherwise accept H0.

1)A random sample of size 300 is drawn from a population and the
sample variance is observed to be 113.59. Can the sample
be regarded as drawn from the population with variance 100
H0: 2 = 100
H1: 2 100
Level of significance:
Appropriate level of significance is 5% (chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by

s -
z ~ N(0,1) at % level of significan ce
2

2n

10.6578 - 10
z 1.6113
100
2 300

Inference:
The critical value of z at 5% level of significance in a two tailed
test is 1.96. Zcal < Zcritical value. We accept H0. Hence we conclude
that the sample has been drawn from the population whose
variance value is 100.

2) A random sample of size 900 is drawn from a population and the


sample S.D is observed to be 2.52. Can the sample be regarded as
drawn from the population with S.D 2.56
H0: = 2.56
H1: 2.56
Level of significance:
Appropriate level of significance is 5% (chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by

s -
z ~ N(0,1) at % level of significan ce
2

2n

2.52 - 2.56
z 0.663
2.56 2
2 900

Inference:
The critical value of z at 5% level of significance in a two tailed
test is 1.96. Zcal < Zcritical value. We accept H0. Hence we conclude
that the sample has been drawn from the population whose
S.D value is 2.56.
Test for Two Variances/ Two S.Ds:
H0: x2 = y2
H1: x2 y2 (or) x2 > y2 (or) x2 < y2
[Any one of these 3 conditions depends on the given problem]
Level of significance:
Appropriate level of significance is % (given/chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by
(s x - s y ) - ( x y )
z ~ N(0,1) at % level of significan ce
x 2
y 2


2n1 2n 2

Inference:
1) If it is a two tailed test (), Reject H0 if Zcal Zcritical value.
Otherwise accept H0.
2) If it is a right tailed test (>), Reject H0 if Zcal Zcritical value.
Otherwise accept H0.
3) If it is a left tailed test (<), Reject H0 if Zcal Zcritical value.
Otherwise accept H0.

1) The mean yield of two sets and their variability are given below. i)
Test whether the difference in the mean yields of two sets is
significant.
ii) Test whether the difference in the variability in yielding is
significant.
Set I Set II
Mean yield 1258 kgs 1243 Kgs
S.D/plot 34 Kgs 28 Kgs
No. of Plots 40 60

H0: x = y
H1: x y
Level of significance:
Appropriate level of significance is 5% (chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by

(x - y) ( x y )
Z ~ N(0,1) at % level of significan ce
2
x
2

y
n1 n2
Under H0, the test statistic becomes,
(1258 - 1243) (0)
Z 2.3154
34 2 282

40 60
Inference:
The critical value of Z at 5% level of significance in a two
tailed test is 1.96 Here, Zcal > Zcritical value. So, we reject H0. Hence
we conclude that the mean yield of the two sets are not equal.

H0: x2 = y2
H1: x2 y2
Level of significance:
Appropriate level of significance is 5% (chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by

(s x - s y ) - ( x y )
z ~ N(0,1) at % level of significan ce
x 2
y 2


2n1 2n 2

(34 - 28) - (0)


z 1.31
2 2
34 28

2 40 2 60

Inference:
The critical value of Z at 5% level of significance in a two
tailed test is 1.96 Here, Zcal < Zcritical value. So, accept H0. Hence
we conclude that the variability of the two sets are equal.

2) Random samples drawn from two countries gave the following


data relating to the heights of adult males.
i)Is the difference between the means significant?
ii) Is the difference between the S.Ds significant?
Country A Country B
Mean height 67.42 inches 67.25 inches
S.D of heights 2.58 inches 2.50 inches
No. of Samples 1000 2000

H0: x = y
H1: x y
Level of significance:
Appropriate level of significance is 5% (chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by

(x - y) ( x y )
Z ~ N(0,1) at % level of significan ce
2
x
2

y
n1 n2
Under H0, the test statistic becomes,
(67.42 - 67.25) (0)
Z 1.561
2.582 2.50 2

1000 2000
Inference:
The critical value of Z at 5% level of significance in a two
tailed test is 1.96 Here, Zcal < Zcritical value. So, we accept H0.
Hence we conclude that the mean heights of the adults males in
the two countries are equal.

H0: x2 = y2
H1: x2 y2
Level of significance:
Appropriate level of significance is 5% (chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by

(s x - s y ) - ( x y )
z ~ N(0,1) at % level of significan ce
x 2
y 2


2n1 2n 2
(2.58 - 2.50) - (0)
z 1.03
2 2
2.58 2.52

2 1000 2 1200

Inference:
The critical value of Z at 5% level of significance in a two
tailed test is 1.96 Here, Zcal < Zcritical value. So, accept H0. Hence
we conclude that the S.Ds of the two sets are equal.

Test for Single Proportion:


H0: P = P0
H1: P P0 (or) P > P0 (or) P < P0
[Any one of these 3 conditions depends on the given problem]
Level of significance:
Appropriate level of significance is % (given/chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by

pP
z ~ N(0,1) at % level of significan ce
PQ
n
Under H0, the test statistic becomes
p P0
z ~ N(0,1) at % level of significan ce
P0 Q 0
n

Inference:
1) If it is a two tailed test (), Reject H0 if Zcal Zcritical value.
Otherwise accept H0.
2) If it is a right tailed test (>), Reject H0 if Zcal Zcritical value.
Otherwise accept H0.
3) If it is a left tailed test (<), Reject H0 if Zcal Zcritical value.
Otherwise accept H0.

Problems related to test for Single Proportion.


1) A coin was tossed 400 times and the head turned up 2/6 times. Test
the hypothesis that the coin is unbiased at 5% level of significance.
H0: P=1/2. (The coin is unbiased. i.e if an unbiased coin is tossed,
the probability of head turning up is )
H1: P1/2.
Level of significance:
Appropriate level of significance is 5% (given)
Test Statistic:
To test the above hypotheses the test statistic is given by

pP
z ~ N(0,1) at % level of significan ce
PQ
n
Under H0, the test statistic becomes
0.3333 0.5
z 6.668
0.5 0.5
400
Inference:
The critical value of Z at 5% level of significance in a two
tailed test is 1.96
Here, Zcal > Zcritical value. So, we reject H0. Hence we conclude
that the the coin is not unbiased coin.

2) A certain cubical die was thrown 9000 times and 5 or 6 was


obtained 3240 times. On the assumption of certain throwing does
the data indicate the die is unbiased?
H0: P=2/6=1/3. ( if an unbiased die coin is thrown, the probability of
getting 5 or 6 is 2/6 )
H1: P1/3.
Level of significance:
Appropriate level of significance is 5% (given)
Test Statistic:
To test the above hypotheses the test statistic is given by
pP
z ~ N(0,1) at % level of significan ce
PQ
n
Under H0, the test statistic becomes
0.36 0.3333
z 6.05
0.3333 0.6666
9000
Inference:
The critical value of Z at 5% level of significance in a two
tailed test is 1.96 . Here, Zcal > Zcritical value. So, we reject H0.
Hence we conclude that the die is biased.

Problems related to test for Two Proportions.


1) In a sample of 600 men from a certain city, 450 men are found to
be smokers. In a sample of 900 from another city 450 are found to
be smokers. Do the data indicate that the two cities are significantly
different with respect to prevalence of smoking habit among men?
Sol:-
H0: Px = Py
H1: Px Py
Level of significance:
Appropriate level of significance is 5% (chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by
(p x p y ) (Px Py )
z ~ N(0,1) at % level of significan ce
1 1
PQ
n1 n 2
x 450 y 450 xy 450 450
px 0.75, py 0.5, P
n 1 600 n 2 900 n 1 n 2 600 900

Under H0, the test statistic becomes


(0.75 0.5) - (0)
z 9.6824
1 1
0.6 0.4
600 900

Inference:
The critical value of Z at 5% level of significance in a two
tailed test is 1.96
Here, Zcal > Zcritical value. So, we reject H0. Hence we conclude
that the two cities are significantly different with respect to
prevalence of smoking habit among men.

2) 58 of 2000 randomly sampled corporations had their 1992 tax


returns audited. In another sample of 2500 corporations, 61 had
their 1991 returns audited. Was the fraction of corporate returns
audited in 1992 significantly different from the 1991 fraction? Test
the appropriate hypotheses at =0.01.
Sol:-
H0: Px = Py
H1: Px Py
Level of significance:
Appropriate level of significance is 5% (chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by
(p x p y ) (Px Py )
z ~ N(0,1) at % level of significan ce
1 1
PQ
n1 n 2
x 58 y 61 xy 58 61
px 0.029, p y 0.0244, P 0.0264
n 1 2000 n 2 2500 n 1 n 2 2000 2500

Under H0, the test statistic becomes


(0.029 0.0244) - (0)
z 0.9583
1 1
0.0264 0.9736
2000 2500
Inference:
The critical value of Z at 5% level of significance in a two
tailed test is 1.96
Here, Zcal < Zcritical value. So, we accept H0. Hence we conclude
that there is no significant difference between the fraction of
corporate returns audited in the years 1992 and 1991.

X 0 1 2 3 4 5 6
F 7 64 140 210 132 75 12
Sol:-
x f fx Px=ncxpxqn-x N.Px
0 7 0 0.014270 9.13 9
1 64 64 0.088230 56.47 56
2 140 280 0.227294 145.47 145
3 210 630 0.312289 199.87 200
4 132 528 0.241350 154.46 154
5 75 375 0.099480 63.67 64
6 12 72 0.017085 10.93 11
640 1949
fx /f = 1949/640=3.0453
P= 3.0453/6= 0.5075 , q=1-p=0.4925
The following mistakes per page were observed in a book. Fit a Poisson
Distribution and test the goodness of fit.
No.of mistakes / page 0 1 2 3 4
No.of pages 211 90 19 5 0
Sol:-
X f fx Px=e-x /x! N.Px
0 211 0 0.644036 209.31 209
1 90 90 0.283376 92.10 92
2 19 38 0.062343 20.26 20
3 5 15 0.009144 2.97 3
4 0 0 0.001006 .33 0
325 143

fx /f = 143/325=0.44

2-test

Test for 2 goodness of fit:


H0: Factors in the contingency table are independent
H1: Factors in the contingency table are dependent.
Level of significance:
Appropriate level of significance is % (given/chosen)
Test Statistic:
To test the above hypotheses the test statistic is given by


2 O E 2 ~ (r2 1)(c1) at % level of significan ce
E
O is the observed value in the (i, j) th cell
where i th row total j th column tot al
E
samplesize
Inference:
If 2 cal 2(r-1)(c-1) , we reject H0. Otherwise we accept H0.

1) In a large manufacturing factory, a survey was conducted regarding


three types of bonus schemes. Total employees were divided into
four categories namely laborers, clerks, technicians and executives.
The results obtained by way of opinion survey are presented in the
form of contingency table as given below. Test the good ness of fit
at 5% level of significance.

EMPLOYEES BONUS SCHEMES


CATEGORY
Type 1 Type 2 Type 3
Labour 190 243 197
Clerks 82 44 44
Technicians 23 78 34
Executives 5 12 8
Sol:- H0: Factors in the contingency table are independent.
H1: Factors in the contingency table are dependent.

Appropriate level of significance is 5%(chosen)


BONUS SCHEMES
EMPLOYEES Type 1 Type 2 Type 3
CATEGORY TOTAL

Labour 190 243 197 630


Expected 196.9 247.4 185.7
Count
Clerks 82 44 44 170
Expected 53.1 66.8 50.1
Count
Technicians 23 78 34 135
Expected 42.2 53.0 39.8
Count
Executives 5 12 8 25
7.8 9.8 7.4
Expected
Count
Total 300 377 283 960

The calculated value of 2=48.101

The table value of 20.05,6=12.59

2cal > 2tab , H0 is rejected.

2) To determine whether there is really a relationship between


employees performance in the companys training program and his
success in the job, a sample of 400 cases were taken and the
following results were obtained. Test at 1% l.o.s whether the
performance in the training program and success in the job are
independent and the table is as given below.

Performance in training program


Below Avg. Above
Avg. Avg.
Poor 23 60 29
Avg 28 79 60
Very good 9 49 63
Sol:- H0: Factors in the contingency table are independent.
H1: Factors in the contingency table are dependent

Appropriate level of significance is 1%(given)

Performance in training program Total


Success Below Avg. Above
in job Avg. Avg.
Poor 23 60 29 112
Expected 16.8 52.6 42.6
Count
Avg. 28 79 60 167
Expected 25.0 78.5 63.5
Count
Very good 9 49 63 121
Expected 18.2 56.9 46.0
Count
Total 60 188 152 400

The calculated value of 2=20.179

The table value of 20.01,4=13.277

2cal > 2tab , H0 is rejected.

ANOVA one-way classification

H0: There is homogeneity among the means i.e 1 = 2 == k


H1: There is no homogeneity among the means i.e 1 2 k

Appropriate level of significance is % (given/chosen)

To test the above hypotheses the procedure is as follows :

Calculations:
Row sum of squares(R. S.S) X i ,
2
where X i is the i th observatio n
G2
Correction factor (C.F) ,
N
where G is the Grand total
and N is the no.of observatio ns in the entire expt.
Sum of Squares due to Total(S.S. T) ST2 R.S.S - C.F

Ti2
Sum of Squares due to Treatments (S.S.tr) S 2
- C.F,
tr
ni
Ti i th row total,
n i no.of.obvs in i th row
Sum of Squares due to Error(S.S. E) Se2 S.S.T - S.S.tr ST2 S2tr
Source of Sum of Degrees of Mean Sum of Variance
Variation Squares freedom Squares Ratio
Treatments S 2tr k-1 S 2tr
s tr
2 s 2tr
k 1 F 2 ~ Fk 1,N k
Error
S e2 N-k S2 se
s e2 e
N-k
Total S T2 N-1 --------- ------------------------

Inference:
If Fcal F,k-1,N-k, We reject H0. Otherwise we accept H0.

Problem 1.
Suppose 3 drying formulas for curing a glue are studied and the following
times are observed. Carry out ANOVA one-way classification at 5% L.O.S
and comment
Formula A 13 10 8 11 8
Formula B 13 11 14 14
Formula C 4 1 3 4 2 4
Sol:-
H0: There is homogeneity among the means i.e A = B = C
H1: There is no homogeneity among the means i.e A B C
Appropriate level of significance is 5% (given)
Ti Ti2/ni
Formula A 13 10 8 11 8 --- 50 500
Formula B 13 11 14 14 --- --- 52 676
Formula C 4 1 3 4 2 4 18 54
G=120 Ti /ni= 1230
2

Row sum of squares(R. S.S) X i 1262


2

G2 105 2
Correction factor (C.F) 960
N 15
Sum of Squares due to Total(S.S. T) ST2 R.S.S - C.F 973 - 735 302
Ti2
Sum of Squares due to Treatments (S.S.tr) S - C.F 1230 - 960 270
2
tr
ni
Sum of Squares due to Error(S.S. E) Se2 S.S.T - S.S.tr ST2 S2tr 302 270 32
Source of Sum of Degrees of Mean Sum of Variance
Variation Squares freedom Squares Ratio
Treatments 270 2 135
(Formulae) 50.625
Error 32 12 2.667
Total 302 14

Inference
The table value of F at 5% level of significance for(2,12) d.f is 3.89
{F,k-1,N-k=F0.05,2,12=3.89}

Fcal > Ftab, We reject H0. Hence we conclude that A B C

Problem 2.
As a part of investigation of the collapse of the roof of a building, a testing
laboratory is given all the available bolts that connected the steel structure at
3 different positions on the roof. The faces required to sheer each of these
bolts are as follows. Perform an ANOVA to test at 0.05 L.O.S whether the
differences among the sample means at the 3 positions are significant.
Position 1 90 82 79 98 83 91
Position 2 105 89 93 104 89 95 86
Position 3 83 89 80 94
Sol:-
H0: There is homogeneity among the means i.e 1 = 2 = 3
H1: There is no homogeneity among the means i.e 1 2 3
Appropriate level of significance is 5% (given)
Ti Ti2/ni
Position 90 82 79 98 83 91 523 45,588.17
1
Position 105 89 93 104 89 95 86 661 62,417.28
2
Position 83 89 80 94 346 29,929.00
3
G=1530
Ti2/ni=1,37,934.45
Row sum of squares(R. S.S) X i 1,38,638
2

G2 1530 2
Correction factor (C.F) 1,37,700
N 17
Sum of Squares due to Total(S.S. T) ST2 R.S.S - C.F 1,38,638 1,37,700 938
Ti2
Sum of Squares due to Treatments (S.S.tr) S - C.F 1,37,934.45 - 1,37,700 234.45
2
tr
ni
Sum of Squares due to Error(S.S. E) Se2 S.S.T - S.S.tr ST2 S2tr 938 234.45 703.55
Source of Sum of Degrees of Mean Sum of Variance
Variation Squares freedom Squares Ratio
Treatments 234.45 2 117.225
(Formulae) 2.333
Error 703.55 14 50.253
Total 938 16
Inference
The table value of F at 5% level of significance for(2,14) d.f is
{F,k-1,N-k=F0.05,2,14= 3.74}
Fcal > Ftab, We reject H0. Hence we conclude that 1 2 3
ANOVA two-way classification
Null Hypothesis
H0(tr): There is homogeneity among the treatments i.e 1 = 2 == k
H0(b) : There is homogeneity among the blocks i.e 1 = 2 == h
Alternative Hypothesis
H1(tr): There is no homogeneity among the treatments 1 2 k
H1(b): There is no homogeneity among the treatments 1 2 h
Appropriate level of significance is % (given/chosen)
1 2 . . . h Ti Ti2
1 X11 X12 . . . X1h T1 T12
2 X21 X22 . . . X2h T2 T22
. . . . . . . . .
. . . . . . . . .
k Xk1 Xk2 . . . Xkh Tk Tk2
Bj B1 B2 . . . Bh G Ti2
Bj2 B12 B22 . . . Bh2 Bj2
Row sum of squares(R. S.S) X i ,
2
where X i is the i th observatio n
G2
Correction factor (C.F) , where G is the Grand total, N is no.of observatio ns in expt.
N
Sum of Squares due to Total(S.S. T) ST2 R.S.S - C.F
1 k 2
Sum of Squares due to Treatments (S.S.tr) S2tr Ti - C.F
h i 1
1 h 2
Sum of Squares due to Blocks(S.S .b) S B j - C.F
2
b
k j1
Sum of Squares due to Error(S.S. E) Se2 S.S.T - S.S.tr - S.S.b ST2 S2tr S2b
Source of Sum of Degrees of Mean Sum of Variance
Variation Squares freedom Squares Ratio
Treatments S 2 k-1 S2
s 2tr tr s 2tr
F 2 ~ Fk 1,(k -1)(h-1)
tr
k 1
Blocks S 2b h-1 S2 se
s 2b b
h 1 s 2b
Error
S e2 (k-1)(h-1)
s e2
Se2 F 2 ~ Fh 1,(k -1)(h-1)
(k - 1)(h - 1) se
Total S T2 kh-1 --------- ------------------------

Problem 1. Carry out ANOVA two-way classification to the following data.


Blocks
Treatment 1 13 7 9 3
Treatment 2 6 6 3 1
Treatment 3 11 5 15 5
Null Hypothesis
H0(tr): There is homogeneity among the treatments i.e 1 = 2 == k
H0(b) : There is homogeneity among the blocks i.e 1 = 2 == h
Alternative Hypothesis
H1(tr): There is no homogeneity among the treatments 1 2 k
H1(b): There is no homogeneity among the treatments 1 2 h
Appropriate level of significance is % (given/chosen)
Blocks Ti Ti2
Treatment 1 13 7 9 3 32 1024
Treatment 2 6 6 3 1 16 256
Treatment 3 11 5 15 5 36 1296
Bj 30 18 27 9 G=84 Ti2=2576
Bj2 900 324 729 81 Bj2=2034
Row sum of squares(R. S.S) X i 786
2

G2 84 2
Correction factor (C.F) 588
N 12
Sum of Squares due to Total(S.S. T) ST2 R.S.S - C.F 786 - 588 198
1 k 2 1
Sum of Squares due to Treatments (S.S.tr) S2tr
h i 1
Ti - C.F 2576 588 56
4
1 h 2 1
Sum of Squares due to Blocks(S.S .b) S2b
k j1
B j - C.F 2034 588 90
3
Sum of Squares due to Error(S.S. E) Se2 S.S.T - S.S.tr - S.S.b ST2 S2tr S2b 198 56 90 52

Source of Sum of Degrees of Mean Sum Variance


Variation Squares freedom of Squares Ratio
Treatments 56 2 28 Ft=3.23
Blocks 90 3 30
Error 52 6 8.67 Fb=3.46
Total 198 11
Inference:-
1. {F,k-1,N-k=F0.05,2,6=5.14 }, Ft < F tab , we accept H0(tr)
2. {F,k-1,N-k=F0.05,3,6=4.76 }, Fb < F tab , we accept H0(b)
Problem 2. Carry out ANOVA two-way classification to the following data.
Student 1 Student 2 Student 3 Student 4 Student 5
Form A 75 73 59 69 84
Form B 83 72 56 70 92
Form C 86 61 53 72 88
Form D 73 67 62 79 95
Null Hypothesis
H0(tr): There is homogeneity among the Forms i.e 1 = 2 = 3= 4
H0(b) : There is homogeneity among the Students i.e 1 = 2 = 3= 4= 5
Alternative Hypothesis
H1(tr): There is no homogeneity among the Forms i.e 1 2 3 4
H1(b): There is no homogeneity among the Students i.e 1 2 3 4 5
Appropriate level of significance is 5% (chosen)
S1 S2 S3 S4 S5 Ti Ti2
Form 75 73 59 69 84 360 129600
A
Form 83 72 56 70 92 373 139129
B
Form 86 61 53 72 88 360 129600
C
Form 73 67 62 79 95 376 141376
D
Bj 317 273 230 290 359 G=1469 Ti2=539705
Bj2 100489 74529 52900 84100 128881 Bj2=440899
Row sum of squares(R. S.S) X i 110607
2

G2 1469 2
Correction factor (C.F) 107898.05
N 20
Sum of Squares due to Total(S.S. T) S T2 R.S.S - C.F 110607 107898.05 2708.95
1 k 2 1
Sum of Squares due to Treatments (S.S.tr) S 2tr
h i 1
Ti - C.F 539705 107898.05 42.95
5
h
1 1
Sum of Squares due to Blocks(S.S .b) S 2b B 2j - C.F 440899 107898.05 2326.70
k j1 4
Sum of Squares due to Error(S.S. E) S e2 S.S.T - S.S.tr - S.S.b S T2 S 2tr S 2b 339.3
Source of Sum of Degrees of Mean Sum Variance
Variation Squares freedom of Squares Ratio
Treatments 42.95 3 14.3167 Ft=0.506
Blocks 2326.7 4 581.675
Error 339.3 12 28.275 Fb=20.572
Total 2708.95 19
Inference:- 1. {F,k-1,N-k=F0.05,12,3=8.74 }, Ft < 1, So F t =1.9762, Ft < F tab,
we accept H0(tr)
2.{F,k-1,N-k=F0.05,4,12=3.26 }, Fb > F tab , we reject H0(b)
CORRELATION & REGRESSION

Problem1:- The I.Qs of a group of 6 persons are measured and they


then sat for a certain examination. Then I.Qs and examination marks
are as follows:
Find the rank correlation coefficient.
person I.Q Exam
marks Xi Yi di=xi-yi d i2
A 110 70 3 3 0 0
B 100 60 4 4 0 0
C 140 80 1 2 -1 1
D 120 90 2 1 1 1
E 80 10 6 6 0 0
F 90 20 5 5 0 0
d2=2

= 1- 6.d2/n(n2-1)=1-0.0572 = 0.9428

Problem2:-Obtain the rank correlation coefficient to the given data.


X Y xi yi di=xi-yi d i2
68 62 4 5 -1 1
64 58 6 7 -1 1
75 68 2.5 3.5 -1 1
50 45 9 10 -1 1
64 81 6 1 5 25
80 60 1 6 -5 25
75 68 2.5 3.5 -1 1
40 48 10 9 1 1
55 50 8 8 0 0
64 70 6 2 4 16
d2=72

In x-series 75 has occurred twice ,so c.f =m(m2-1)/12=0.5


In x-series 64 has occurred thrice ,so c.f =m(m2-1)/12=2
In y-series 68 has occurred twice ,so c.f =m(m2-1)/12=0.5

Now corrected d2=72+0.5+2+0.5=75

= 1- 6.d2/n(n2-1)=1-0.0572 = 0.545
MISC
1. In a partially destroyed laboratory record of an analysis of
correlation data the following results are only legible:
Variance of X =9
Regression equations are :
8X-10Y+66=0
40X-18Y=214
a) What were the mean values of X and Y
b) Correlation coefficient between X and Y
c) Standard deviation of Y

2. Determine the two regression lines:


x y
Average 7.6 14.8
S.D 3.6 2.5
Correlation coefficient is 0.9

3. Determine the regression coefficients and correlation


coefficient from the following data:
x 2 5 4 7 3 9
y 6 8 7 10 4 8

4. Determine the regression lines and estimate the value


of x when y=15
x 2 5 4 8 7 6 10
y 3 8 3 7 9 8 11

5. (1994) Determine the regression lines for the given data:


x 5 6 7 9 12
y 13 11 9 8 6

TIME SERIES ANALYSIS

Time series analysis is the data collected over a period of time . Our purpose is to see what
changes take place over the time in the event we are observing . We can try to predict the
future behavior of that event based on the data available, hoping that the recent behavior of this
series will overshadow earlier behavior..
Fore casting or predicting is an essential tool in any decision-making process. Time series
analysis is a quantitative method in which we use to determine patterns in data collected over
a period of time.
Analysis of time series helps us to understand the past behavior of time series data. With the
knowledge of the past behavior , it would be possible, within certain limits, to forecast the
probable future variations of the data. Thus it helps in planning future operations.
Time series is divided into three components
1.Long term variations (Trend)
2.Short term variations (periodic changes)
i)Seasonal variations
ii)Cyclical variations
3.Random or irregular variations

TREND
Trend means general tendency of the data to increase or decrease during a long period of time.
The term long period of time is a relative term and cannot be exactly defined. In some cases,
a week days may become a long period of time and along time period as long as two years
may become a small time period.
Foe example , the bacterial growth is very high for every five minutes and week days become
a long period in the study of bacteria while 2 years in the agriculture production become a
short time period. So the term trend depends upon the case under study.
Reasons for studying Trend:
1. The study of secular trends allow us to describe a historical pattern.
2. Studying secular trends permits us to project past pattern or trends into the future.

Various measures of trend


i) graphical method
ii) Method of semi-averages
iii) Method of Least squares
iv) Method of moving Averages

1) Graphical Method (Free hand) Method:


Free hand method or graphic method is the simplest method for studying trend. In this
method the actual figures (given data) are first plotted as points on a graph paper showing
the time series data along the vertical axis and time along the horizontal axis. Then a
straight line is drawn to fit as closely as possible the plotted points.
(To draw the line, leave equal number of points on both sides of it at more or less equal
distance). The line so obtained shows the direction of the trend and the vertical distance of
this line gives the trend value for each time period.
By this method a quick estimate of the trend can be obtained, but this depends too much
on the judgment of the investigation. Different people will locate the line in different
positions . This method should be used only when a quick approximate idea of the trend
is required.
Fit a trend line to the following data by the free hand method
Year 1970 1971 1972 1973 1974 1975 1976 1977

Sales in 62 64 66 63.5 67 64.5 69 67


million Rs)

TREND LINE

70
69
SALES IN MILLIONS

68
67
66
65
64
63
62
61
1969 1970 1971 1972 1973 1974 1975 1976 1977 1978
YEAR

2)Semi Average Method


In the semi average method , the given data is first divided into two parts and an average
for each part is found. Then these two averages are plotted on the graph paper against the
mid points of the time intervals covered by the respective two parts. These two points are
joined by a straight line. This straight line is the required trend line.
Although this method is simple to apply, it may lead to poor results when used
indiscriminately . It is applicable only where the trend is linear or approximately linear.

Draw a trend line by the semi average method using the following data:
Year 1973 1974 1975 1976 1977 1978
Production of steel 253 260 255 263 259 264
(in lakh tons)

Sol:- The average production of steel for the first three years 253+260+255/3=256
The average production of steel for the last three years 263+259+264/3=262
Thus we get two points 256 and 262 which are plotted against the respective
middle years 1974 and 1977 of the two parts 1973-1975 and 1976-1978 . By
joining these two points , the required trend line is obtained.

SEMI AVERAGE METHOD

266

264
PRODUCTION OF STEEL

262

260

258

256

254

252
1972 1973 1974 1975 1976 1977 1978 1979
YEAR

3 Method of least squares:


This method is widely used for measurement of trend.
Linear Trend:- Let (x1,y1),(x2,y2).(xn,yn) be n pairs of observations where yi
represents time series and xi represents time.
Let Y=a+bX be the linear equation of the straight line. The normal equations are
y = na + bx
xy = ax + bx2
Problem 1:-
Determine the equation of a straight line which best fits the following data:
Year 1974 1975 1976 1977 1978
Sales(in000) 35 56 79 80 40
Sol:-
Year Sales X Y XY X2
1974 35 -2 35 -70 4
1975 56 -1 56 -56 1
1976 79 0 79 0 0
1977 80 1 80 80 1
1978 40 2 40 80 4
0 290 34 10
5a+0b=290
a0+10b=34 a=58 , b=3.4
Required straight line equation is:
Y=58+3.4(X-1976)
Y=-6660.4+3.4X
Problem2:-
Fit a straight line to the following data:
X 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Y 42 50 61 75 92 111 120 127 140 138
Sol:-
X Y X-1986 XY X2
1981 42 -5 -210 25
1982 50 -4 -200 16
1983 61 -3 -183 9
1984 75 -2 -150 4
1985 92 -1 -92 1
1986 111 0 0 0
1987 120 1 120 1
1988 127 2 254 4
1989 140 3 420 9
1990 138 4 552 16
956 -5 511 85
10a-5b=956
-5a+85b=511 a=101.5939, b=11.9878
Y = 101.5939 + 11.9878(X-1986)
Y = -23706.177 + 11.9878 X

Problem3:-
Fit a trend line to the following data of sales of a commodity in a shop using least square
theory and estimate the volume of sales in 1997
Year 1990 1991 1992 1993 1994
Sales in 000 2.4 2.8 3.1 3.6 4.2
Sol:-

X Y X=X-1992 XY X2
1990 2.4 -2 -4.8 4
1991 2.8 -1 -2.8 1
1992 3.1 0 0 0
1993 3.6 1 3.6 1
1994 4.2 2 8.4 4
16.1 0 4.4 10

Y= 3.22+0.44(X-1992)
Y=-873.28+0.44X
YX=1997= 5.42
Problem4:- (2002 march)
Mr.Ramesh the auditor of a Govt. public school has received the inventory records to
determine if the current inventory holdings of text books are typical. The following
inventory amounts are from the previous 5 years
Year 1988 1989 1990 1991 1992
Inventory in Rs. 4620 4910 5490 5730 5990
b) Find the linear equation that describes the trend in the inventory holdings.
c) Estimate the value of inventory for the year 1993

X Y X=X-1990 XY X2
1988 4620 -2 -9240 4
1989 4910 -1 -4910 1
1990 5490 0 0 0
1991 5730 1 5730 1
1992 5990 2 11980 4
26740 0 3560 10

Y= 5348 + 356(X-1990)
Y= -703092 + 356X
Yx=1993 = 6416

Fit a Straight line by Principle of least square to the data. Estimate the likely production
for the year 2005.
X 1995 1996 1997 1998 1999 2000
Y 24 25 27 29 30 33
X Y X=X-1997 XY X2
1995 24 -2 -48 4
1996 25 -1 -25 1
1997 27 0 0 0
1998 29 1 29 1
1999 30 2 60 4
2000 33 3 99 9
168 115 19

Y=27.1143 + 1.7714(X-1997)
Y= -3510.3715 + 1.7714
YX=2005 = 41.2855

Non Linear Trend


Problem1:-
Y = a+bx+cx2
Y = na +bx+cx2
xy = ax+bx2+cx3
x2y=ax2+bx2+cx4

X Y X XY X2 X3 X4 X2Y
1985 13 -2 -26 4 -8 16 52
1986 24 -1 -24 1 -1 1 24
1987 39 0 0 0 0 0 0
1988 65 1 65 1 1 1 65
1989 106 2 212 4 8 16 424
247 0 227 10 0 34 565

Y= 39.2572 + 22.7 (X-1987) + 5.0714 (X-1987)2


Problem2:- (April 2001)
The sales of a company in lakhs of rupees for the years 1975 through 1981 are
given below
Year 1975 1976 1977 1978 1979 1980 1981
Sales 32 47 65 92 132 190 275

Estimate sales figures for the year 1982 using an equation of the form Y=ab X
Where x represents years and y represents sales.

YEAR SALES X=X-1978 Y=LOG(Y) XY X2


1975 32 -3 1.5051 -4.5153 9
1976 47 -2 1.6720 -3.3440 4
1977 65 -1 1.8129 -1.8129 1
1978 92 0 1.9637 0 0
1979 132 1 2.1205 2.1205 1
1980 190 2 2.2787 4.5874 4
1981 275 3 2.4393 7.3179 9
0 13.7923 4.3236 28

A=1.9704 ; B =0.1544

Y= 1.9704 + 0.1544 (X-1978)

Y = 93.4114 * 1.4269 X

YX=1982 = 2.5881

Y= Antilog (2.58)= 387.2576

4.Method of Moving Averages:

It consists in measurement of trend by smoothening out the fluctuations of


the data by means of a moving average . Moving average of extent m is a series of
successive averages of m terms at a time starting with 1st,2nd,3rd.terms etc.
Thus the first average is the mean of the 1st m terms and the 2nd to (m+1)th term, the third is
the mean of the m terms from 3rd to (m+2)th term and so on.
If m is odd =(2k+1) say, moving average is placed against the midvalue of
the time interval it covers. i.e against t=k+1. and if m is even it is placed between the two
middle values of the time interval it covers. i.e between t=k and t=k+1. In the latter case the
moving average does not coincide with an original time period and an attempt is made to
synchronize the moving averages and the original data by centering the moving averages
which consists in taking a moving average of extenet two and putting these values against
t=k+1. The graph obtained on plotting thr moving average against time gives trend.

Problem1:-
Work out the centered 4 yearly moving averages for the following data
Year T Four year Two year
moving moving
Average Average
1971 2204

1972 2500
2436
1973 2360 2463.5
2491
1974 2680 2507.75
2524.50
1975 2424 2592.50
2660.50
1976 2634 2712.75
2765
1977 2904 2858.50
2952
1978 3098 2991.75
3031.50
1979 3172 3074.50
3117.50
1980 2952 3126.75
3136
1981 3248

1982 3172

MOVING AVERAGES

3200

3000
DATA

2800

2600

2400

2200
1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983
YEAR

problem2:-
Calculate the 5 yearly moving average to the following data
Year T Moving
average
1966 19.3
1967 20.9
1968 17.8 18.34
1969 16.1 18.04
1970 17.6 17.52
1971 17.8 17.42
1972 18.3 18.48
1973 17.3 18.82
1974 21.4 18.88
1975 19.3 19.12
1976 18.1 19.50
1977 19.5 19.66
1978 19.2 19.98
1979 22.2 20.66
1980 20.9 21.14
1981 21.5
1982 21.9

MOVING AVERAGES

23
22
21
DATA

20
19
18
17
16
1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983
YEAR

Problem3:-
The data below give the index of industrial production from 1961 to 1970:
Determine the trend by means of moving averages i)3 year ii)5 year
Year 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970
Index of 109.2 119.8 129.7 140.8 153.8 153.2 152.6 163.0 175.3 184.3
production

Year Index 3 yr avg


1961 109.2
1962 119.8 119.57
1963 129.7 130.10
1964 140.8 141.43
1965 153.8 149.27
1966 153.2 153.2
1967 152.6 156.27
1968 163.0 163.63
1969 175.3 174.2
1970 184.3

3 YEARLY MOVING AVEARGES

190
180
170
160
INDEX

150
140
130
120
110
100
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971
YEAR
Year Index 5 yr
avg
1961 109.2
1962 119.8
1963 129.7 130.66
1964 140.8 139.46
1965 153.8 146.22
1966 153.2 152.68
1967 152.6 159.58
1968 163.0 165.68
1969 175.3
1970 184.3

5 YEARLY MOVING AVERAGES

190
180
170
160
INDEX

150
140
130
120
110
100
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971
YEAR

Algorithm:-
Step1: Enter the no. of observations i.n
Step2:Read all the paired observations (Ut, t)
Step3:Enter the extent of moving average i.e m
Step4: Calculate the mid values

M1= (t1+t2+.+tm )/m


M2= (t2+t3+.+tm+1)/m
.
.
Mq= (tq+tq+1++tn )/m

Step5:If m is odd i.e (m=2k+1) goto step6; else if m is even (m=2k) goto step7;
Step6:Place the moving average against the midvalue of the time interval it covers.
Step7:Place the moving average between the kth and (k+1)th time series
Step8:Synchronize the moving average so that the average is against the time period.

Short term variations:


a) Seasonal variations : Nature and custom are responsible for these variations . These
variations are due to rhythmic forces which operate in a regular and periodic manner over a
span of less than one year. Thus seasonal variations in a time series will exist if the data are
recorded quarterly, monthly, weekly, daily, hourly and so on. Thus in a time series data where
only annual figures are given , there are no seasonal variations. The following are responsible
for the seasonal variations.
i) Those resulting from natural forces: Weather and climatic changes play an important role in
seasonal movements. For instance the sale of umbrellas pick up very fast in rainy season, the
demand for electric fans goes up in summer and the sale of icecream increases in summer.
ii)Those resulting from man made conventions: These variations in a time series within a period
of 12 months are due to habits fashions, customs and conventions of the people in the society.
For example, the sale of gold ornaments goes up during marriages and festivals. These
variations operate in a regular manner

b) Cyclical variations: The oscillatory moments in a time series with a period of oscillation
more than one year are known as cyclical variations in one completet period is known as a
cycle oftenly known as business cycle which is also known as four facet cycle composed of
prosperity (Boom) , recession , depression and recovery. Generally these variations operate
from seven to eleven years.

Various measures of seasonal variations:

1.Method of simple averages .


2.Ratio to trend method.
3. Ratio to moving average method.
4. Link relative method

1. Method of simple averages


The data below gives the average quarterly process of a commodity for four years .
Calculate the seasonal variation indices.

Year I quarter II quarter III quarter IV quarter


1980 40.3 44.8 46.0 48.0
1981 50.1 53.1 55.3 59.5
1982 47.2 50.1 52.1 55.2
1983 55.4 59.0 61.6 65.3

Sol:-

Year I quarter II quarter III quarter IV quarter


1980 40.3 44.8 46.0 48.0
1981 50.1 53.1 55.3 59.5
1982 47.2 50.1 52.1 55.2
1983 55.4 59.0 61.6 65.3
Average 48.25 51.75 53.75 57 x
=52.69
Seasonal 91.57 98.21 102.01 108.18
Index

Algorithm:-
1.Read the data
2.Compute the averages for the IQ,IIQ,IIIQ & IVQ
3.Compute the overall average x
xi
4.Compute the seasonal Index by 100
x

2.Ratio to trend method:


problem1:-
Find the quarterly indices by Ratio to trend method to the following data

Table 1:-
Year IQ IIQ IIIQ IVQ Ut T=t-1980 Ut.T T2
1978 10 27 21 40 24.5 -2 -49 4
1979 11 35 29 57 33 -1 -33 1
1980 14 51 33 74 43 0 0 0
1981 19 57 43 78 49.25 1 49.25 1
1982 22 67 45 101 58.75 2 117.5 4
208.5 0 84.75 10

Ut = a+bT
Ut = na+bT
T.Ut=aT+bT2 a=41.7 , b=8.475

Ut=41.7+8.475T
Ut=41.7+8.475(t-1980)
Ut= -16738.8 + 8.475t

Table2:-
Year 1978 1979 1980 1981 1982
Trend values 24.75 33.225 41.7 50.175 58.65

Quarterly Indices = b/4 = 8.475/4 =2.11875

IIQ=T.V-1/2Q.I
IIIQ=T.V+1/2Q.I
IQ=IIQ-Q.I
IVQ=IIIQ+Q.I

Quarterly Trend values

Table3:-
YEAR IQ IIQ IIIQ IVQ
1978 21.5718 23.6906 25.8093 27.9280
1979 30.0468 32.1656 34.2843 36.4030
1980 38.5218 40.6466 42.7593 44.8780
1981 46.9968 49.1156 51.2343 53.3530
1982 55.4711 57.5906 59.7093 61.8280

Table 4 (percentage table):-


YEAR IQ IIQ IIIQ IVQ
1978 46.3568 113.9692 81.366 143.225
1979 36.6095 108.8118 84.5868 156.5805
1980 36.3430 125.4902 77.1761 164.8914
1981 40.4282 116.0527 83.9281 146.1969
1982 39.6602 116.1324 75.3650 163.356
AVG S.I 39.2795 116.09126 80.4844 154.85 390.7051
Corrected S.I 40.214 118.85 82.4 158.53 399.9940

Problem2:-
Calculate seasonal variation for the following data of salesin thousands Rs. Of a firm by the
Ratio to Trend Method
Table 1:-
Year IQ IIQ IIIQ IVQ Ut T=t-1980 Ut.T T2
1979 30 40 36 34 35 -2 -70 4
1980 34 52 50 44 45 -1 -45 1
1981 40 58 54 48 50 0 0 0
1982 54 76 68 62 65 1 65 1
1983 80 92 86 82 85 2 170 4
280 0 120 10

Ut = a+bT
Ut = na+bT
T.Ut=aT+bT2 a=56 , b=12

Ut=56+12T
Ut=56+12(t-1980)
Ut= -23716 + 12t

Table2:-
Year 1979 1980 1981 1982 1983
Trend values 32 44 56 68 80

Quarterly Indices = b/4 = 12/4 =3

IIQ=T.V-1/2Q.I
IIIQ=T.V+1/2Q.I
IQ=IIQ-Q.I
IVQ=IIIQ+Q.I

Quarterly Trend values


Table3:-
YEAR IQ IIQ IIIQ IVQ
1979 27.5 30.5 33.5 36.5
1980 39.5 42.5 45.5 48.5
1981 51.5 54.5 57.5 60.5
1982 63.5 66.5 69.5 72.5
1983 75.5 78.5 81.5 84.5

Table 4 (percentage table):-


YEAR IQ IIQ IIIQ IVQ
1979 109.1 131.1 107.5 93.1
1980 86.1 122.4 109.9 90.7
1981 77.7 106.4 93.9 79.3
1982 85 114.3 97.8 85.5
1983 106 117.1 105.5 97
AVG S.I 92.78 118.26 102.92 87.32 401.28
Corrected S.I 92.4840 117.8827 102.5916 87.0414 399.9997

Algorithm :-

Step1: Read the values


Step2: Calculate average values from the quarterly data i.e Ut
Step3: Fit a straight line of the form Ut=a+bt to the data
Step4: Calculate the yearly trend values
Step5: Calculate quarterly indices by Q.I=b/4
Step6: Find the quarterly trend values by
IIQ=T.V-1/2Q.I
IIIQ=T.V+1/2Q.I
IQ=IIQ-Q.I
IVQ=IIIQ+Q.I
Step7: Calculate percentage values by the formula old value/quarterly value * 100
Step8: Calculate the average seasonal Indices
Step9: Corrected seasonal indices are obtained by avg S.I/Total S.I * 400
Ratio to Moving Averages : A resort hotel wanted to establish the seasonal pattern of room
demand by its clientele. Hotel management wants to improve customer service and is
considering several plans to employ personal during peak periods to achieve this goal. The
table contains the quarterly occupancy during each quarter of the last four years. Calculate S.I
by ratio to moving averages.
YEAR IQ IIQ IIIQ IVQ
1980 75 60 54 59
1981 86 65 63 80
1982 90 72 66 85
1983 100 78 72 93
Sol:-
Year Ut Qtrly Adj.
moving mov.
Avg. Avg.
1980 75

60
62
54 63.375
64.75
59 65.375
66
1981 86 67.125
68.25
65 70.875
73.5
63 74
74.5
80 75.375
76.25
1982 90 76.625
77
72 77.625
78.25
66 79.5
80.75
85 81.5
82.25
1983 100 83
83.75
78 84.75
85.75
72
YEAR IQ IIQ IIIQ IVQ
1980 85.2071 90.2485
1981 128.1192 91.7108 85.1351 106.1360
1982 117.4551 92.7536 83.0189 104.2945
1983 120.4819 92.0354
Avg S.I 112.0187 92.1666 84.4537 100.2263 398.865
Adj. S.I 122.3659 92.4288 84.694 100.5115 400.000

Step1: Read the data


Step2: Calculate the quarterly moving averages to the given data
Step3: Adjust the quarterly moving average such that the moving average is placed
against the time series data.
Step4: Calculate percentage table by formula old value/corresponding moving avg. * 100
Step5: Calculate the quarterly average values which are known as seasonal indices
Step6: Adjust the seasonal indices by Avg. seasonal index/total seasonal index * 400
Step7: The adjusted seasonal indices are the index values at the quarterly data

LINK RELATIVE METHOD

Find the Seasonal Indices by Link Relative Method

YEAR IQ IIQ IIIQ IVQ


1981 68 60 61 63
1982 70 58 56 60
1983 68 63 68 67
1984 65 59 56 62
1985 60 55 51 58

Sol:-
YEAR IQ IIQ IIIQ IVQ
1981 ----------- 88.2353 101.6666 103.2787
1982 111.1111 82.857 96.5517 107.1428
1983 113.3333 92.6470 107.9365 98.5294
1984 97.0149 90.7692 94.9152 110.7142
1985 96.774 91.6666 92.7272 113.7254
Average L.R 104.5582 89.2348 98.7593 106.6779
Chain relatives (C.R) 100 89.2348 88.1277 94.0128
Adjusted C.R (S.I) 100 89.6603 88.9785 95.2890 373.9281
Corrected S.Is 106.972 95.9114 95.8821 101.9327 399.9981

D=1/4{104.5582*94.0128/100-100} = - 0.4254
Algorithm:-
Step1:-Translate the original data by using Link relative which is given by
Link Relative of any Quarter =Current Quarter figure/Old Quarter figure *100
Step2:-In the next step we can find the average of Link Relatives
Step3:-Convert the average Link Relatives to Chain Relatives on the basis of the IQ
Let us suppose that the Chain Relative of IQ =100
Let Chain Relative of IIQ = L.R of IIQ * C.R of IQ /100
Let Chain Relative of IIIQ = L.R of IIIQ * C.R of IIQ /100
Let Chain Relative of IVQ = L.R of IVQ * C.R of IIIQ /100

Step4:- Calculate the adjusted C.Rs


Adjusted C.R of IQ=100
Adjusted C.R of IIQ=calculated C.R-d
Adjusted C.R of IIIQ=calculated C.R -2d
Adjusted C.R of IVQ=calculated C.R-3d

Where d= [X1-100]
X1=L.R of the IQ*C.R of IVQ/100

Step5:-If the sum of the seasonal indices is not equal to 400 we adjust the seasonal
indices as usual.

Random or Irregular variations

These variations are not detected and they are beyond the human control. Earth
quakes, floods etc are responsible for these variations.