Professional Documents
Culture Documents
THEORIES
M. A. BOATENG
SAMPLING DISTRIBUTIONS
In statistics, it is not always possible to take into account all the members of a set.
In such a case, a sample or many samples, are drawn from a population.
When it is necessary to make predictions about a population based on random
sampling, often many samples of say N members are taken, before predictions
are made.
If the mean value and standard deviation of each of the samples is calculated, it
will be found that the results vary from sample to sample, even though the
samples are taken from the same population.
M. A. BOATENG
M. A. BOATENG
=
1
M. A. BOATENG
. ()
Equation (*) is used for a finite population of size and/or for sampling without
replacement.
When is very large compared with N or when the population is infinite (this
can be considered to be the case when sampling is done with replacement), the
correction factor
M. A. BOATENG
Theorem:
If all possible samples of size N are drawn from a population of size and the
mean value of the sampling distribution of means is determined, then:
= ()
Where is the mean of the population.
If the sample size is large (usually taken as 30 or more), then the relationship
between the mean of the sampling distribution of means and the mean of the
population distribution is very near to that in equation (**).
Similarly, the relationship between the standard error of the means and the
standard deviation of the population is very near to that in equation (*).
M. A. BOATENG
M. A. BOATENG
EXAMPLE:
The heights of 3000 people are normally distributed with mean 175cm and a
standard deviation of 8cm. If random samples of 40 people are taken , find the
standard deviation and mean of the sampling distribution of means if sampling is
(a) With replacement
(b) Without replacement
M. A. BOATENG
SOLUTION:
(a) When sampling is done with replacement, the total number of possible samples
is infinite. Hence, the standard error of the mean:
=
8
40
= 1.265
M. A. BOATENG
(b) When sampling is done without replacement, the total number of possible
samples is finite, hence, the standard error of the means;
8
40
300040
30001
M. A. BOATENG
10
Example:
1500 bolts have a mean mass of 6.5kg and a standard deviation of 0.5kg. Find the
probability that a sample of 60 bolts chosen at random from the group, without
replacement will have a combined mass of between 378kg and 396kg
SOLUTION:
For the population: = 1500; = 0.5; = 6.5
For the sample: = 60
= = 6.5 (? ? )
The standard error of the means,
0.5
1500 60
=
=
= 0.0633
1
1500 1
60
M. A. BOATENG
11
Thus, the sample under consideration is part of a normal distribution of mean 6.5kg
and a standard error of the means of 0.0633kg.
If the combined mass of 60 bolts is between 378kg and 396kg, then the mean mass
378
396
of each of the 60 bolts lies between kg and kg, ie between 6.3kg and 6.6kg.
60
60
M. A. BOATENG
12
Reading from the Z table, the values obtained are 0.9429 and 0.0008 respectively.
Hence, the probability of the mass lying between 6.3kg and 6.6kg is:
0.9429 0.0008 = 0.9421
This means if 10000 samples are drawn, 9422 of these samples will have a
combined mass of between 378kg and 396kg.
M. A. BOATENG
13
When a population is large, it is not practical to determine its mean and standard
deviation by using the basic formulae for these parameters.
M. A. BOATENG
14
M. A. BOATENG
15
Generally, a point estimate does not indicate how close the value is to the true
value of the quantity and should be accompanied by additional information on
which its merits may be judged.
A statement of the error or the precision of an estimate is often called its
reliability.
In statistics, when estimates are made of population parameters based on
samples, usually interval estimates are used.
M. A. BOATENG
16
CONFIDENCE INTERVALS
Let be the mean value of a sampling statistic of the sampling distribution, that is,
the mean value of the means of the samples or the mean value of the standard
deviations of samples. Also, let be the standard deviation of the samples or the
standard deviation of the means of the samples or standard deviation of the
standard deviations of the samples.
Because the sampling distribution of the means and of the standard deviations are
normally distributed, it is possible to predict the probability of the sampling statistic
lying in the intervals:
mean 1 standard deviation
mean 2 standard deviations
mean 3 standard deviations
NB: Parameters such as mean and standard deviation of a sampling distribution
are called sampling statistics, S.
M. A. BOATENG
17
68.26%
95.44%
99.74%
The percentage values in the table above are called confidence levels for
estimating sampling statistic.
M. A. BOATENG
18
A confidence level of 68.26% is associated with two distinct values, these being;
+
i.e. between 1 standard deviation and + 1 standard deviation
These two values are called the confidence limits of the estimate and the
distance between the confidence limits is called the confidence interval.
A confidence interval indicates the expectation or confidence of finding an
estimate of a population parameter in that interval, based on a sampling statistic.
M. A. BOATENG
19
M. A. BOATENG
20
M. A. BOATENG
21
M. A. BOATENG
22
The confidence limits for the mean of the population for an infinite population are:
EXAMPLE:
It is found that the standard deviation of the diameters of rivets produced by a
certain machine over a long period of time is 0.018cm. The diameters of a random
sample of 100 rivets produced by this machine in a day have a mean value of
0.476cm. If the machine produces 2500 rivets a day, determine;
(a) The 90% confidence limits.
(b) The 97% confidence limits for an estimate of the mean diameter of all the rivets
produced by the machine in a day.
M. A. BOATENG
23
SOLUTION:
For the population:
Standard deviation, = 0.018
Number in population, = 2500
For the sample:
Number in sample, = 100
Mean, = 0.476
Since the population is finite and the standard deviation of the population is
known, we use
M. A. BOATENG
24
SOLUTION:
For the population:
Standard deviation, = 0.018
Number in population, = 2500
For the sample:
Number in sample, = 100
Mean, = 0.476
Since the population is finite and the standard deviation of the population is
known, we use
M. A. BOATENG
25
(a) For a 90% confidence level, , the confidence coefficient, is 1.64 (read from
tables). Hence, the estimate of the confidence limits of the population mean, is;
0.476 1.64
0.018
100
2500 100
= 0.476 (0.00296)(0.9800)
2500 1
= 0.476 0.0029
26
(b) For a 97% confidence level, the value of determined from a table of partial
areas under the standardized normal curve is 2.17
Hence, the estimated value of the confidence limits of the population mean is :
0.476 2.17
0.018
100
2500 100
2500 1
27
(b) Estimating the mean and standard deviation of a population from sample data.
(Estimating the mean when the standard deviation of the population is unknown)
When the standard deviation of a large population is not known, several samples
are drawn from the population.
The mean, and the standard deviation, of the sampling distribution may be
determined.
The confidence limits of the mean value of the population, are given by:
Where is the confidence coefficient corresponding to the confidence level
required.
M. A. BOATENG
28
29
EXAMPLE:
Several samples of 50 fuses selected at random from a large batch are tested when
operating at a 10% overload current and the mean time of the sampling
distribution before the fuses failed is 16.50 minutes. The standard error of the
mean is 1.4 minutes. Determine the estimated mean time to failure of the batch of
fuses for a confidence level of 90%.
SOLUTION:
For the sampling distribution: the mean, = 16.50
The standard error, = 1.4
M. A. BOATENG
30
The estimated mean of the population is based on sampling distribution data only,
thus, the confidence limits will be:
= 16.50 (1.64)(1.4)
= 16.50 2.30 minutes
Thus, the 90% confidence level of the mean time to failure is from 14.20 minutes to
18.80 minutes.
M. A. BOATENG
31
()
( 1)
Where is the mean of the sample, is the mean value of the population from
which the sample is drawn, is the standard deviation of the sample and is the
number of independent observations in the sample.
M. A. BOATENG
32
The confidence limits of the mean value of a population based on a small sample
drawn at random from the population is given by;
( 1)
Where is called the confidence coefficient for small samples, analogous to for
large samples, is the standard deviation of the sample, is the mean value of the
sample and is the number of members in the sample.
M. A. BOATENG
33
NB:
Definition (Degrees of freedom)
The sample number, N, minus the number of population parameters which must be
estimated for the sample.
When determining the mean of a population based on a small sample size, only
one parameter is to be estimated, hence ( )can always be
taken as 1.
Example:
A sample of 12 measurements of the diameter of a bar are made and the mean of
the sample is 1.850cm. The standard deviation of the sample is 0.16mm. Determine
(a) The 90% confidence limits
(b) The 70% confidence limits for an estimate of the actual diameter of the bar.
(Answer to (b) is 1.847cm and 1.853cm, try obtaining it.)
M. A. BOATENG
34
SOLUTION:
For the sample: sample size, = 12; mean, = 1.850; standard deviation, =
0.16 = 0.016
Since the sample number is less than 30, we use
( 1)
(a) With = 1.796 from the table (reading from the intersection of 0.05 and 11)
Since the degrees of freedom will be 12 1 = 11.
Thus,
1.850
(1.796)(0.016)
= 1.850 0.00866
11
Hence, the 90% confidence limits are 1.841cm and 1.858cm indicating that the
actual diameter is likely to lie within this interval with 90% chance of being correct.
M. A. BOATENG
35
EXAMPLE:
The specific resistance of some copper wire of nominal diameter 1mm is estimated
by determining the resistance of 6 samples of the wire. The resistance values found
in ohms per meter were:
2.16, 2.14, 2.17, 2.15, 2.16 2.18.
Determine the 95% confidence interval for the true resistance of the wire.
SOLUTION:
For the sample: sample size, = 6
Mean, =
2.16+2.14+2.17+2.15+2.16+2.18
6
Standard deviation, =
()
= 2.16 1
M. A. BOATENG
36
(2.16 2.16)2 +(2.14 2.16)2 +(2.17 2.16)2 +(2.15 2.16)2 +(2.16 2.16)2 +(2.18 2.16)2
6
0.001
6
= 0.0129 1
M. A. BOATENG
37
HYPOTHESIS TESTING
The objective of most statistical investigations is to make inferences about
unknown population parameters based on information contained in the sample
data.
Very often researchers are confronted with problems which require taking
decisions based on available data, instead of finding estimates of the parameters.
In an attempt to arrive at such decisions, it becomes imperative to make
assumptions or statements about the populations and then subject them to
statistical verification using the sample observations or experimental evidence.
Hypothesis testing is therefore a statistical procedure that uses a random sample
data to determine whether a statement about a population should be accepted
or not.
M. A. BOATENG
38
HYPOTHESIS
A hypothesis is a statement, assertion or speculation about the value of a
population parameter for the purpose of testing.
There are two types of hypotheses: the null hypothesis, 0 and the alternative
hypothesis, 1
The null hypothesis, 0 is the initial statement (assertion) and the alternative
hypothesis, 1 is the hypothesis that contradicts the null hypothesis.
M. A. BOATENG
39
M. A. BOATENG
40
41
.
The general procedure in hypothesis testing is to
specify a value of the probability of type I error
often called the significance of the test, and then
design the test procedure so that the probability of
type II error has a suitably small value.
M. A. BOATENG
42
43
M. A. BOATENG
44
M. A. BOATENG
45
M. A. BOATENG
46
M. A. BOATENG
47
and
=
M. A. BOATENG
48
M. A. BOATENG
49
M. A. BOATENG
50
For small sample sizes (usually taken as < 30), the sampling
distribution is not normally distributed, but approximates to the
Students t-distribution.
In this case, t-values rather than z-values are used.
For finite populations of size
51
M. A. BOATENG
52
Where
i.e. =
2 1
M. A. BOATENG
53
M. A. BOATENG
54
EXAMPLE:
Cement is packed in bags by an automatic machine. The mean mass of
the contents of a bag is 1.000kg. Random samples of 36 bags are
selected throughout the day and the mean mass of a particular sample
is found to be 1.003kg. If the manufacturer is willing to accept a
standard deviation of 0.01kg on all bags packed and a level of
significance of 0.05, above which values the machine must be stopped
and adjustments made, determine if as a result of the sample under
test, the machine should be adjusted.
SOLUTION:
= 1.000, = 1.003 , = 0.01 , = 36
0 : =
1 :
M. A. BOATENG
55
Since the sample size is large and , , and , the z-value of the
sample mean is;
1.003 1.000
0.003
= =
=
= 1.8
0.01
0.0016
36
The z-value corresponding to 0.05 level of significance is 1.96,
Since < , we fail to reject the null hypothesis, thus, the
machine should not be adjusted.
M. A. BOATENG
56
TYPES OF TEST
We have the one-tailed and the two-tailed tests.
TWO-TAILED
A two-tailed test is a simple hypothesis which takes the form;
0 : = 0
1 : 0
ONE-TAILED
A one-tailed test is a simple hypothesis which takes the form;
0 : 0
OR
0 : 0
1 : > 0
1 : < 0
M. A. BOATENG
57
TEST STATISTIC
The test-statistic is:
(i)
0 0
=
=
()
(ii)
t=
M. A. BOATENG
58
The value of the test statistic, Z is compared with the critical value, obtained
from the unit normal distribution.
The value of the test statistic, t is compared with the critical value, ,1
2
M. A. BOATENG
59
SOLUTION:
0 : = 200
1 : 200
= 0.01
Test statistic:
0 0
=
=
()
M. A. BOATENG
60
The test statistic value will be compared with a value from the Z table is found by
0.01
=
= 0.005
2
2
1 = 1 0.005 = 0.995
2
And the corresponding value is 2.58, since its a two-tailed test we are looking at
the region between (2.58 2.58).
Since the Z calculated is less than the Z from the table, we fail to reject the null
hypothesis.
M. A. BOATENG
61
EXAMPLE:
The annual report of a cement manufacturing company reveals that, the daily
mean number of bags of cement purchased in a community is at most 215 bags.
Random samples of 45 buyers were selected and observed. The mean number of
cement bags was 45 bags and had a standard deviation of 30.5 kg. Does the sample
data contradict the prior belief ? Test the hypothesis using = 0.05.
SOLUTION:
H0 : 215
H1 : > 215
Test statistic:
0 215 45
= =
= 37.39
30.5
45
The critical value is = 0.05 = 1.65 . We reject H0 and conclude the belief is
false.
M. A. BOATENG
62
NB:
To accept or fail to accept a null hypothesis:
If > or > , we reject the null
hypothesis.
and represent the test statistic values.
To use the p-value approach;
If p-value is less than the , we reject the
null hypothesis and vice versa
M. A. BOATENG
63
64
12 22
+
1 2
For large samples, the z-value is;
(1 2 )0
12 22
+
1 2
M. A. BOATENG
65
(1 2 )0
12 22
+
1 2
M. A. BOATENG
66
EXAMPLE:
An automatic machine is producing components and as a result of
many tests, the standard deviation of their size is 0.02 cm. Two samples
of 40 components are taken, the mean size of the first being 1.51 cm
and the second 1.52cm. Determine whether the size has altered
appreciably if a level of significance of 0.05 is adopted.
SOLUTION:
Since both samples are drawn from the same population, 1 = 2 =
= 0.02
Also, 1 = 2 = 40 and 1 = 1.51 , 2 = 1.52 , = 0.05
M. A. BOATENG
67
(1 2 )0
12 22
+
1 2
1.51 1.52
2
0.02
40
= 2.236
The z-value from the table is between -1.96 and 1.96 since it is a two
tailed test. Since the z-calculated is outside the range of the ztabulated, we reject the null hypothesis and conclude that the size has
been altered.
M. A. BOATENG
68