You are on page 1of 68

SAMPLING AND ESTIMATION

THEORIES

M. A. BOATENG

SAMPLING DISTRIBUTIONS
In statistics, it is not always possible to take into account all the members of a set.
In such a case, a sample or many samples, are drawn from a population.
When it is necessary to make predictions about a population based on random
sampling, often many samples of say N members are taken, before predictions
are made.
If the mean value and standard deviation of each of the samples is calculated, it
will be found that the results vary from sample to sample, even though the
samples are taken from the same population.

M. A. BOATENG

If M samples of N members are drawn at random from a population, the mean


values for the M samples together form a set of data.
Similarly, the standard deviations of the M samples collectively form a set of data.
Sets of data based on many samples drawn from a population are called
Sampling distributions.
They are often used to describe the chance fluctuations of mean values and
standard deviations based on random sampling.

M. A. BOATENG

THE SAMPLING DISTRIBUTION OF THE MEANS


Theorem:
If all possible samples of size N are drawn from a finite population, , without
replacement and the standard deviation of the mean values of the sampling
distribution of means is determined, then:

=
1

Where is the standard deviation of the sampling distribution of means, is the


standard deviation of the population.
The standard deviation of a sampling distribution of mean values is called the
standard error of the means.

M. A. BOATENG

Thus, the standard error of the means is;

. ()

Equation (*) is used for a finite population of size and/or for sampling without
replacement.
When is very large compared with N or when the population is infinite (this
can be considered to be the case when sampling is done with replacement), the
correction factor

approaches unity and equation (*) becomes:

M. A. BOATENG

Theorem:
If all possible samples of size N are drawn from a population of size and the
mean value of the sampling distribution of means is determined, then:
= ()
Where is the mean of the population.
If the sample size is large (usually taken as 30 or more), then the relationship
between the mean of the sampling distribution of means and the mean of the
population distribution is very near to that in equation (**).

Similarly, the relationship between the standard error of the means and the
standard deviation of the population is very near to that in equation (*).

M. A. BOATENG

Another important property of a sampling distribution is that when the sample


size, N, is large, the sampling distribution of means approximates to a normal
distribution with mean value and standard deviation .
This is true for all normally distributed populations and also for populations which
are not normally distributed provided the population size is at least twice as large
as the sample size.
This property of normality of a sampling distribution is based on the central limit
theorem.

M. A. BOATENG

EXAMPLE:
The heights of 3000 people are normally distributed with mean 175cm and a
standard deviation of 8cm. If random samples of 40 people are taken , find the
standard deviation and mean of the sampling distribution of means if sampling is
(a) With replacement
(b) Without replacement

M. A. BOATENG

SOLUTION:
(a) When sampling is done with replacement, the total number of possible samples
is infinite. Hence, the standard error of the mean:
=

8
40

= 1.265

The mean of the sampling distribution:


= = 175

M. A. BOATENG

(b) When sampling is done without replacement, the total number of possible
samples is finite, hence, the standard error of the means;

8
40

300040
30001

= 1.265 0.9935 = 1.257


Since the sample size is large, the mean of the sampling distribution of means is the
same for both finite and infinite populations. Thus,
= = 175

M. A. BOATENG

10

Example:
1500 bolts have a mean mass of 6.5kg and a standard deviation of 0.5kg. Find the
probability that a sample of 60 bolts chosen at random from the group, without
replacement will have a combined mass of between 378kg and 396kg
SOLUTION:
For the population: = 1500; = 0.5; = 6.5
For the sample: = 60
= = 6.5 (? ? )
The standard error of the means,

0.5
1500 60
=
=
= 0.0633
1
1500 1

60

M. A. BOATENG

11

Thus, the sample under consideration is part of a normal distribution of mean 6.5kg
and a standard error of the means of 0.0633kg.
If the combined mass of 60 bolts is between 378kg and 396kg, then the mean mass
378
396
of each of the 60 bolts lies between kg and kg, ie between 6.3kg and 6.6kg.
60

60

Using the standard normal value;

Hence, for the sampling distribution of means, we get;



=

M. A. BOATENG

12

6.3kg corresponds to a Z value of:


6.36.5
0.0633

= 3.16 standard deviations

6.6kg corresponds to a Z value of:


6.66.5
0.0633

= 1.58 standard deviations

Reading from the Z table, the values obtained are 0.9429 and 0.0008 respectively.
Hence, the probability of the mass lying between 6.3kg and 6.6kg is:
0.9429 0.0008 = 0.9421
This means if 10000 samples are drawn, 9422 of these samples will have a
combined mass of between 378kg and 396kg.
M. A. BOATENG

13

THE ESTIMATION OF POPULATION PARAMETERS BASED ON A LARGE SAMPLE

When a population is large, it is not practical to determine its mean and standard
deviation by using the basic formulae for these parameters.

When a population is infinite, it is impossible to determine these values. For large


and infinite populations, the values of the mean and the standard deviation may
be estimated by using the data obtained from samples drawn from the
population.

M. A. BOATENG

14

POINT AND INTERVAL ESTIMATES


An estimate of a population parameter, such as the mean or standard deviation,
based on a single number is called a point estimate.
An estimate of a population parameter given by two numbers between which the
parameter is considered to lie is called an interval estimate.
Thus, if an estimate is made of the length of an object and the result is quoted as
150cm, it is a point estimate.
If the result is quoted as 150 10cm, then it is an interval estimate, since it lies
between 140cm and 160cm

M. A. BOATENG

15

Generally, a point estimate does not indicate how close the value is to the true
value of the quantity and should be accompanied by additional information on
which its merits may be judged.
A statement of the error or the precision of an estimate is often called its
reliability.
In statistics, when estimates are made of population parameters based on
samples, usually interval estimates are used.

M. A. BOATENG

16

CONFIDENCE INTERVALS
Let be the mean value of a sampling statistic of the sampling distribution, that is,
the mean value of the means of the samples or the mean value of the standard
deviations of samples. Also, let be the standard deviation of the samples or the
standard deviation of the means of the samples or standard deviation of the
standard deviations of the samples.
Because the sampling distribution of the means and of the standard deviations are
normally distributed, it is possible to predict the probability of the sampling statistic
lying in the intervals:
mean 1 standard deviation
mean 2 standard deviations
mean 3 standard deviations
NB: Parameters such as mean and standard deviation of a sampling distribution
are called sampling statistics, S.
M. A. BOATENG

17

The percentage probability of a sampling statistic :


Percentage
mean 1 standard deviation

68.26%

mean 2 standard deviations

95.44%

mean 3 standard deviations

99.74%

The percentage values in the table above are called confidence levels for
estimating sampling statistic.

M. A. BOATENG

18

A confidence level of 68.26% is associated with two distinct values, these being;
+
i.e. between 1 standard deviation and + 1 standard deviation
These two values are called the confidence limits of the estimate and the
distance between the confidence limits is called the confidence interval.
A confidence interval indicates the expectation or confidence of finding an
estimate of a population parameter in that interval, based on a sampling statistic.

M. A. BOATENG

19

(a) ESTIMATING THE MEAN OF A POPULATION WHEN THE STANDARD

DEVIATION OF THE POPULATION IS KNOWN


When a sample is drawn from a large population whose standard deviation is
known, the mean value of the sample, can be determined.
This mean value can be used to make an estimate of the mean value of the
population, . When this is done, the estimated mean value of the population is
given as lying between two values (confidence interval).
If a high level of confidence is required in the estimated value of , then the
range of the confidence interval will be large.
Conversely, a low level of confidence has a narrow confidence interval.

M. A. BOATENG

20

In general, any particular confidence level can be obtained in the estimate by


using;

Where is the confidence coefficient corresponding to the particular confidence
level required.
Thus, for a 96% confidence level, the confidence limits of the population mean are
given by 2.05
Since only one sample is drawn, the standard error of the means is not known.

M. A. BOATENG

21

However, it is shown that ;

Thus, the confidence limits of the mean of the population are:

for a finite population of size .


NB: = 1 where = 1
2

M. A. BOATENG

22

The confidence limits for the mean of the population for an infinite population are:

EXAMPLE:
It is found that the standard deviation of the diameters of rivets produced by a
certain machine over a long period of time is 0.018cm. The diameters of a random
sample of 100 rivets produced by this machine in a day have a mean value of
0.476cm. If the machine produces 2500 rivets a day, determine;
(a) The 90% confidence limits.
(b) The 97% confidence limits for an estimate of the mean diameter of all the rivets
produced by the machine in a day.

M. A. BOATENG

23

SOLUTION:
For the population:
Standard deviation, = 0.018
Number in population, = 2500
For the sample:
Number in sample, = 100
Mean, = 0.476
Since the population is finite and the standard deviation of the population is
known, we use

M. A. BOATENG

24

SOLUTION:
For the population:
Standard deviation, = 0.018
Number in population, = 2500
For the sample:
Number in sample, = 100
Mean, = 0.476
Since the population is finite and the standard deviation of the population is
known, we use

M. A. BOATENG

25

(a) For a 90% confidence level, , the confidence coefficient, is 1.64 (read from
tables). Hence, the estimate of the confidence limits of the population mean, is;

0.476 1.64

0.018
100

2500 100
= 0.476 (0.00296)(0.9800)
2500 1
= 0.476 0.0029

Thus, the confidence limits are 0.473cm and 0.479cm.


This indicates that if the mean diameter of all rivets is 0.476cm, then it is
predicted that the mean diameter of all the rivets will be between 0.473cm and
0.479cm and this prediction is made with confidence that it will be correct nine
times out of ten.
M. A. BOATENG

26

(b) For a 97% confidence level, the value of determined from a table of partial
areas under the standardized normal curve is 2.17
Hence, the estimated value of the confidence limits of the population mean is :

0.476 2.17

0.018
100

2500 100
2500 1

= 0.476 0.0039 0.9800


= 0.476 0.0038
Thus, the 97% confidence limits are 0.472cm and 0.480cm.
NB: It can be observed that the higher the value of the confidence level the larger
the confidence interval as (b) shows.
M. A. BOATENG

27

(b) Estimating the mean and standard deviation of a population from sample data.
(Estimating the mean when the standard deviation of the population is unknown)

When the standard deviation of a large population is not known, several samples
are drawn from the population.
The mean, and the standard deviation, of the sampling distribution may be
determined.

The confidence limits of the mean value of the population, are given by:

Where is the confidence coefficient corresponding to the confidence level
required.

M. A. BOATENG

28

To make an estimate of the standard deviation, , of a normally distributed


population;
A sampling distribution of the standard deviations of the sample means is
formed.
The standard deviation of the sampling distribution is determined by using the
basic standard deviation formula.
This standard deviation is called the standard error of the standard deviations and
is usually written as .
If s is the standard deviation of a sample (standard error), then the confidence
limits of the standard deviation of the population is given by:

M. A. BOATENG

29

EXAMPLE:
Several samples of 50 fuses selected at random from a large batch are tested when
operating at a 10% overload current and the mean time of the sampling
distribution before the fuses failed is 16.50 minutes. The standard error of the
mean is 1.4 minutes. Determine the estimated mean time to failure of the batch of
fuses for a confidence level of 90%.
SOLUTION:
For the sampling distribution: the mean, = 16.50
The standard error, = 1.4

M. A. BOATENG

30

The estimated mean of the population is based on sampling distribution data only,
thus, the confidence limits will be:
= 16.50 (1.64)(1.4)
= 16.50 2.30 minutes
Thus, the 90% confidence level of the mean time to failure is from 14.20 minutes to
18.80 minutes.

M. A. BOATENG

31

ESTIMATING THE MEAN OF A POPULATION BASED ON A SMALL SAMPLE SIZE


When the sample size is small (usually taken as less than 30), the techniques
used for estimating the population parameters become more and more
inaccurate as the sample size becomes smaller. This is because the sampling
distribution no longer approximates to a normal distribution.
To make realistic estimates when sample sizes are small, the students tdistribution is used. The t-value is determined from the relationship;
=

()

( 1)

Where is the mean of the sample, is the mean value of the population from
which the sample is drawn, is the standard deviation of the sample and is the
number of independent observations in the sample.
M. A. BOATENG

32

The confidence limits of the mean value of a population based on a small sample
drawn at random from the population is given by;


( 1)

Where is called the confidence coefficient for small samples, analogous to for
large samples, is the standard deviation of the sample, is the mean value of the
sample and is the number of members in the sample.

M. A. BOATENG

33

NB:
Definition (Degrees of freedom)
The sample number, N, minus the number of population parameters which must be
estimated for the sample.
When determining the mean of a population based on a small sample size, only
one parameter is to be estimated, hence ( )can always be
taken as 1.
Example:
A sample of 12 measurements of the diameter of a bar are made and the mean of
the sample is 1.850cm. The standard deviation of the sample is 0.16mm. Determine
(a) The 90% confidence limits
(b) The 70% confidence limits for an estimate of the actual diameter of the bar.
(Answer to (b) is 1.847cm and 1.853cm, try obtaining it.)
M. A. BOATENG

34

SOLUTION:
For the sample: sample size, = 12; mean, = 1.850; standard deviation, =
0.16 = 0.016
Since the sample number is less than 30, we use

( 1)
(a) With = 1.796 from the table (reading from the intersection of 0.05 and 11)
Since the degrees of freedom will be 12 1 = 11.
Thus,
1.850

(1.796)(0.016)

= 1.850 0.00866

11
Hence, the 90% confidence limits are 1.841cm and 1.858cm indicating that the
actual diameter is likely to lie within this interval with 90% chance of being correct.
M. A. BOATENG

35

EXAMPLE:
The specific resistance of some copper wire of nominal diameter 1mm is estimated
by determining the resistance of 6 samples of the wire. The resistance values found
in ohms per meter were:
2.16, 2.14, 2.17, 2.15, 2.16 2.18.
Determine the 95% confidence interval for the true resistance of the wire.
SOLUTION:
For the sample: sample size, = 6
Mean, =

2.16+2.14+2.17+2.15+2.16+2.18
6

Standard deviation, =

()

= 2.16 1

M. A. BOATENG

36

(2.16 2.16)2 +(2.14 2.16)2 +(2.17 2.16)2 +(2.15 2.16)2 +(2.16 2.16)2 +(2.18 2.16)2
6
0.001
6

= 0.0129 1

We now locate 0.025,5 on the t-table, which is 2.571.

The estimated value of the 95% confidence limits is given by:


(2.571)(0.0129)
2.16
= 2.16 0.0148 1
5
Thus, the 95% confidence interval is . , . .

M. A. BOATENG

37

HYPOTHESIS TESTING
The objective of most statistical investigations is to make inferences about
unknown population parameters based on information contained in the sample
data.
Very often researchers are confronted with problems which require taking
decisions based on available data, instead of finding estimates of the parameters.
In an attempt to arrive at such decisions, it becomes imperative to make
assumptions or statements about the populations and then subject them to
statistical verification using the sample observations or experimental evidence.
Hypothesis testing is therefore a statistical procedure that uses a random sample
data to determine whether a statement about a population should be accepted
or not.
M. A. BOATENG

38

HYPOTHESIS
A hypothesis is a statement, assertion or speculation about the value of a
population parameter for the purpose of testing.
There are two types of hypotheses: the null hypothesis, 0 and the alternative
hypothesis, 1
The null hypothesis, 0 is the initial statement (assertion) and the alternative
hypothesis, 1 is the hypothesis that contradicts the null hypothesis.

M. A. BOATENG

39

M. A. BOATENG

40

The probabilities of the two


errors are given special symbols
P(type I error) P(reject H0 H0 is true)
P(type II error) P(Accept H0 H0 is false)

Sometimes it is convenient to work


with the power of the test, where
Power 1 P(reject H0 H0 is false)
M. A. BOATENG

41

.
The general procedure in hypothesis testing is to
specify a value of the probability of type I error
often called the significance of the test, and then
design the test procedure so that the probability of
type II error has a suitably small value.

M. A. BOATENG

42

SIGNIFCANCE LEVEL (P-VALUE)

There is always a probabilistic


component involved in the acceptreject decision in testing hypothesis.
The criterion that is used for
accepting or rejecting a null
hypothesis is called significance level
or p-value
M. A. BOATENG

43

M. A. BOATENG

44

SIGNIFICANCE TEST FOR POPULATION MEANS


When carrying out tests or measurements, it is often possible to form
a hypothesis as a result of these tests.
For example, the boiling point of water is found to be:
101.7, 99.8, 100.4, 100.3, 99.5 98.9, as a result of
six tests.
The mean of these six results is 100.1. Based on these results, how
confidently can we say that the result from the sampling is
significantly different from the true result ?

M. A. BOATENG

45

Usually in significance tests, some predictions about population


parameters based on sample data are required.
In significance tests for population means, a random sample is drawn
from the population and the mean value of the sample, is
determined.
The testing procedure depends on whether or not the standard
deviation of the population is known.

M. A. BOATENG

46

(ONE SAMPLE TEST)


(a)When the standard deviation of the population is known
A null hypothesis is made that there is no difference between the
value of a sample mean and that of the population, , i.e. 0 : =
If many samples had been drawn from a population and a sampling
distribution of means had been formed, then provided N is
large(usually taken as 30), the mean values will form a normal
distribution having a mean value of and a standard deviation of
(standard error)

M. A. BOATENG

47

Using the standard normal variate , the relationship is:



=

However, for finite populations;

And for infinite populations;


=

and

=
M. A. BOATENG

48

Where N is the sample size, is the size of the population, is the


mean of the population and is the standard deviation of the
population. Thus;
For finite populations of size

=

For infinite populations,

M. A. BOATENG

49

The alternative hypothesis is 0 :


The decision will be to :
Reject the null hypothesis at a given significance level

Fail to reject the null hypothesis.

M. A. BOATENG

50

For small sample sizes (usually taken as < 30), the sampling
distribution is not normally distributed, but approximates to the
Students t-distribution.
In this case, t-values rather than z-values are used.
For finite populations of size

For infinite populations,

Where is the modulus of t, i.e. the positive value of t.


M. A. BOATENG

51

(b) When the Standard Deviation of the Population is


NOT known
It is found in practice that if the standard deviation of a sample is
determined, its value is less than the value of the standard deviation
of the population from which it is drawn.

This is as expected, since the range of a sample is likely to be less than


the range of the population.
The difference between the two standard deviations becomes more
pronounced when the sample size is small.

M. A. BOATENG

52

The relationship between the sample variance, 2 and the population


variance, 2 is;
2

Where

is the Bessels correction.

Thus an estimate of the standard deviation of a population, 2 using


the sample standard deviation, , is :
2

i.e. =

2 1

M. A. BOATENG

53

For large samples, the factor

can be omitted. i.e.

For small samples, the factor cannot be omitted. i.e

M. A. BOATENG

54

EXAMPLE:
Cement is packed in bags by an automatic machine. The mean mass of
the contents of a bag is 1.000kg. Random samples of 36 bags are
selected throughout the day and the mean mass of a particular sample
is found to be 1.003kg. If the manufacturer is willing to accept a
standard deviation of 0.01kg on all bags packed and a level of
significance of 0.05, above which values the machine must be stopped
and adjustments made, determine if as a result of the sample under
test, the machine should be adjusted.
SOLUTION:
= 1.000, = 1.003 , = 0.01 , = 36
0 : =
1 :
M. A. BOATENG

55

Since the sample size is large and , , and , the z-value of the
sample mean is;

1.003 1.000
0.003
= =
=
= 1.8
0.01
0.0016

36
The z-value corresponding to 0.05 level of significance is 1.96,
Since < , we fail to reject the null hypothesis, thus, the
machine should not be adjusted.

M. A. BOATENG

56

TYPES OF TEST
We have the one-tailed and the two-tailed tests.
TWO-TAILED
A two-tailed test is a simple hypothesis which takes the form;
0 : = 0
1 : 0
ONE-TAILED
A one-tailed test is a simple hypothesis which takes the form;
0 : 0
OR
0 : 0
1 : > 0
1 : < 0
M. A. BOATENG

57

TEST STATISTIC
The test-statistic is:
(i)

0 0
=
=
()

Where is large n 30 , is replaced by the sample standard deviation, if its


unknown.

(ii)
t=

Where is small n < 30 , and is unknown.

M. A. BOATENG

58

The value of the test statistic, Z is compared with the critical value, obtained
from the unit normal distribution.
The value of the test statistic, t is compared with the critical value, ,1
2

obtained from the t-distribution table with 1 degrees of freedom.


EXAMPLE:
The efficiency rating of RTEP employees at an engineering firm have been normally
distributed over a long period. The mean of the distribution is 200 and the standard
deviation is 16. Recently, however, young employees have been hired and
production methods inaugurated. The efficiency rating of 100 production
employees were analyzed and the mean found to be 203.5. Using = 0.01, test
the hypothesis that the mean is still 200.

M. A. BOATENG

59

SOLUTION:

0 : = 200
1 : 200

= 0.01
Test statistic:

0 0
=
=
()

203.5 200 3.5


=
=
= 2.19
16
1.6
100

M. A. BOATENG

60

The test statistic value will be compared with a value from the Z table is found by

locating (1 ) on the standard normal table:


2

0.01
=
= 0.005
2
2

1 = 1 0.005 = 0.995
2
And the corresponding value is 2.58, since its a two-tailed test we are looking at
the region between (2.58 2.58).
Since the Z calculated is less than the Z from the table, we fail to reject the null
hypothesis.

M. A. BOATENG

61

EXAMPLE:
The annual report of a cement manufacturing company reveals that, the daily
mean number of bags of cement purchased in a community is at most 215 bags.
Random samples of 45 buyers were selected and observed. The mean number of
cement bags was 45 bags and had a standard deviation of 30.5 kg. Does the sample
data contradict the prior belief ? Test the hypothesis using = 0.05.

SOLUTION:
H0 : 215
H1 : > 215
Test statistic:

0 215 45
= =
= 37.39
30.5

45

The critical value is = 0.05 = 1.65 . We reject H0 and conclude the belief is
false.
M. A. BOATENG

62

NB:
To accept or fail to accept a null hypothesis:
If > or > , we reject the null
hypothesis.
and represent the test statistic values.
To use the p-value approach;
If p-value is less than the , we reject the
null hypothesis and vice versa
M. A. BOATENG

63

COMPARING TWO SAMPLE MEANS


Sometimes it may be necessary to compare performance of, say
workers in two different engineering firms.
The null hypothesis for this kind of test is that there is no difference
between the mean values of the two population.
This is based on the theorem below:
Theorem:
If 1 and 2 are the means of random samples of size 1 and 2 drawn
from populations having means of 1 and 2 and the standard
deviations of 1 and 2 , then the sampling distribution of the
differences of the means, (1 2 ), is a close approximation to a
normal distribution with mean of 0 and standard deviation of
M. A. BOATENG

64

12 22
+
1 2
For large samples, the z-value is;

(1 2 )0
12 22
+
1 2

M. A. BOATENG

65

For small samples, the t-value is;


=

(1 2 )0
12 22
+
1 2

M. A. BOATENG

66

EXAMPLE:
An automatic machine is producing components and as a result of
many tests, the standard deviation of their size is 0.02 cm. Two samples
of 40 components are taken, the mean size of the first being 1.51 cm
and the second 1.52cm. Determine whether the size has altered
appreciably if a level of significance of 0.05 is adopted.
SOLUTION:
Since both samples are drawn from the same population, 1 = 2 =
= 0.02
Also, 1 = 2 = 40 and 1 = 1.51 , 2 = 1.52 , = 0.05

M. A. BOATENG

67

0 : 1 2 = 0 , i.e. the size of the component is not altered


1 : 1 2 0 , i.e. the size of the component is altered
Since the sample is large;
=

(1 2 )0

12 22
+
1 2

1.51 1.52
2
0.02
40

= 2.236

The z-value from the table is between -1.96 and 1.96 since it is a two
tailed test. Since the z-calculated is outside the range of the ztabulated, we reject the null hypothesis and conclude that the size has
been altered.
M. A. BOATENG

68

You might also like