Topic 6 Statistical Inference

1. Why do inference?
MBB:
Chapters 8 & 9
8.1 , 8.2
2. Confidence interval for 8.3 , 8.4 , 8.5

3. Test of significance for
9.1 , 9.2 , 9.3
4. p value and level of significance 9.3
Topic 6
5. Type I and Type II errors 9.3

6. One tailed tests & CIs
8.8
7. Statistical power 9.3
Estimates and their accuracy:
8. Power of a z-test (one & two sided)
Confidence Intervals & Hypothesis Testing
9. Practical versus statistical significance
Error Types, Power & Sample size
10. Obtaining the sample size
8.9
We take a random sample of 10 males from this year and obtain

their heights.
Up until now we have assumed we know the population

parameters and 2.
We have made predictions about the next observation or the
sample mean etc.
We obtain: 172 178 185 190 175 163 184 176 173 197
Sample statistics: x = 179.3
not very realistic !

Usually we have a sample and want to make predictions about the
population, or some population parameter (e.g. or 2).
s = 9.84
n = 10
What conclusions can we make about the average height

of all the male STAT171 students for this year?
year
The estimate of the population average height
=179.3
for this year is the sample average
Example:
In past years, the height of male STAT171 students has followed a
Normal distribution with = 175 cm & = 15cm.
15cm
But it is very unlikely that is 179.3 !

Could it be 175?
180?
190?
We want to investigate whether the average height of

this years male STAT171 students is any different.
3
If we take new sample we will almost certainly

get a new x i.e. a different
Therefore, it makes sense to think in terms of a range of values that

might be.
i.e. what values of (for this year) are
believable based on the sample we obtained
substitute for Z:
mult by /n:
We know from theory that:

P(-1.96 Z 1.96) = 0.95
subtract X:
If we can assume that X has a Normal

(or approximately Normal)
distribution, we can assume that:
P ( 1.96 Z +1.96 ) = 0.95
Start with
X
P 1.96
+1.96 = 0.95
n
P 1.96
X +1.96
= 0.95
n
n
P X 1.96
X + 1.96
= 0.95
n
n
mult by (-1): P + X + 1.96
+ + X 1.96
= 0.95
n
write with smaller value at left hand end
P X 1.96
X + 1.96
= 0.95
n
n
X
~ Z
n
So, we can write P(-1.96 < Z < +1.96) = 0.95
This is called the 95% C.I. for All we need is X, n and .
By using a different z critical value, we can change the level of

confidence of the interval. We can have:
(when X is normally distributed and is known)::

There is a 95% chance that the interval:
a 99% confidence interval (zcrit = 2.575),
, X + 1.96
X 1.96
n
n
a 90% confidence interval (zcrit = 1.645),

etc.
includes the true population mean .
But 95% is by far the most commonly used (see why later).
But there is a 5% chance that it doesnt.
In general terms, a 100(1-)%

100(1- )% confidence interval for is:
Note: x is always in the centre of the interval
1 -
/2
- it is still our best single point estimate of .

7
X 1.96
-z/2
/2
z/2
General case of a C.I.:

sample crit * se(est)
estimate
value
99% confidence interval for :

( x z0.005 * /n )
x = 179.3 n = 10 = 15 (assumed)
= ( 179.3 2.575 * 15/10 )
We assume is unchanged from previous years but may be

different (often quite valid assumption).
( xx 1.96 * //n
n )
= ( 179.3 1.96*15/10
1.96*15/ 10 )
= ( 179.3 9.297 )
= ( 179.3 12.2 )
Recall, is a constant (does

not change) for any specific
population. It is the
interval which changes
from sample to sample.
Note: the more confident that we want to be that the interval

includes , the wider the interval needs to be.
Level of confidence is 100(1100(1-)%.
)%
(alpha) has a distinct meaning done later
= (170.0 , 188.6)
We are 95% confident that the interval 170.0 to 188.6 includes
the population mean , where is the true mean height for all
the male STAT171 students for this year.
year
There is a 1% chance this

interval does not contain
= (167.1, 191.5)
(x 1.645 * /n )
= ( 179.3 1.645 * 15/10 )
= ( 179.3 7.8 )
9
= (171.5, 187.1)
The assumption is that:
Remember that is fixed for a given population.
There is a 10% chance this

interval does not contain
10
X
~Z
n
It is the confidence interval that is the variable.

Different samples from the same population will give different
sample means and hence different confidence intervals.
This requires:
Therefore:
n to be large enough to assume that X

X is approximately
Normally distributed (using CLT) as covered in Topic 5
X to be Normally distributed
Do say:
say:
There is a 95% chance that the calculated confidence
interval includes .
; OR
AND is known.
There are occasions when is known and is not. But it is much
more common for to be unknown as well.
(we will come back to this in Topic 7)
The statement should be made in terms of the

confidence interval being the variable
Dont say:
say: There is a 95% chance that lies in the
confidence interval - it either does or it doesnt.
11
For the height example, s = 9.84, so it is reasonable, on this

evidence, to assume is still 15 (same as in the past).
12
(test of significance)
The other approach to making conclusions about the population
mean is to use the sample mean to test for a specific hypothesised
possible value of .
i.e. Is the value of xx evidence that could be or could not be a
particular value?
hypothesis testing
We have observed: x = 179.3 n = 10 = 15.0 (assumed)

There are only two possibilities for this years male STAT171
population:
is still 175 and we have observed a sample mean of 179.3,
due purely to chance variation.
Whereas the approach that we have just used is:

what does x tell us about the likely values that could be?
confidence interval
is some other value (unknown, but not 175),

175)
and as a result, we have observed a sample mean of 179.3
due to chance variation about this different (but unknown) .
Both approaches are equally valid (and use the same statistical
theory).
13
In general, X ~ N(
, 2/n) and
The population mean for previous years was 175 cm.
carry out a test of significance
14
If is not 175, then
X
~ Z
n
X 175
15 10
is
NOT
~ Z
because we have subtracted the wrong .

X ~ N(
N(
, 152/10) so
In this case X
X
~ Z
15 10
x 175
10
So we can test whether is 175 by testing whether

15
is a z value.
If the test statistic is a z value, then could be 175.
i.e. that the mean height of this years

Hence, if = 175, STAT171 males is the same as in the past
then,
X 175
~ Z
15 10
If the test statistic is not a z value, then we have evidence

that is not 175.
So is it a z value ??
15
16
z-value believable (high prob of obtaining)

Problem:
Problem:
our sample was believably from a population with the specified mean.
A z value can be any value in the range - to +.
z-value not believable (low prob of obtaining)
So, we can only determine if it is a reasonable

(or believable by chance) z value.
our sample was NOT believably from a population with the specified mean.
For our example the test statistic is: zobs =

We can calculate how likely it is to observe by chance a z value
such as we have calculated.
The higher the probability of obtaining the observed mean value
(or one even more extreme), the more willing we will be to believe
that for this year is still 175.
x 179.3 175
=
0.91
n
15 10
We determine whether 0.91 is a reasonable z value by calculating

the probability of obtaining a random z value equal to the observed
z, or a more unusual one (i.e. a z value less favourable to a of 175).
Using z-tables we get
p = P(| Z | 0.91)
2 * 0.1814
0.3628
17
That is, if the true mean were 175, and we take a sample of size
10, we would expect to get a sample mean of 179.3 or a sample
mean further away from 175 (at least 4.3 cm in either direction)
in 36% of the samples, just due to chance variation.
variation
18
Therefore, with a probability this high (~36%) it is quite

reasonable to assume that this year is still 175 and that the
deviation we have observed is explained merely by chance
variation.
That is, there is no substantial (or statistically significant)
significant
evidence in our sample to be able to conclude that is not 175.
Another way of thinking about it:

If = 175 is true, we can expect to observe variation in the
sample mean at least as big as that obtained (4.3 cm) in 36% of
our samples.
This indicates that we do not have an unusual sample for a

population with = 175.
A sample as extreme as the one we have would happen more
than one-third of the time by pure chance (due to random
variability).
This probability is called the p

p-value.
value
19
20
However it is possible that is some other value (e.g. 178) and

that the sample mean this year is randomly varying about that
population mean.
denoted by (alpha)
There is no way we can be sure of either conclusion

(that is or is not 175),
since we have based our conclusion on a probability.
is the maximum probability that will allow the hypothesised value

to be rejected as not likely or not believable.
A probability of 0.363 is reasonably easy to make a decision on.
It (the significance level ) is the cut-off between what is considered:
But what if the probability were
reasonably likely through pure random variation
0.12 (i.e. 12%)
versus
or
0.018
not believable through pure random variation alone
or
0.003 etc
We have to declare a cut-off!

21
if p >
We conclude that the observed z-value is reasonably likely.
Hence the sample mean is close enough to the hypothesised value
to conclude that the difference can be treated as due to chance.
i.e. Not reject the hypothesised value.
if p
We conclude that the observed z-value is reasonably unlikely.
un
Hence the sample mean is so far away from the hypothesised value
that we can conclude that it is not reasonable that the difference
could have occurred due to chance alone.
i.e. Reject the hypothesised value.
23
22
The level of significance determines the extent of evidence that is

required before the hypothesised value can be rejected as
unlikely and is the choice of the experimenter.
An of 0.05 indicates that the test statistic would have to occur,
by chance, less than 5% of the time, before the test statistic and
hence the hypothesised value could be classified as unlikely.
An of 0.01 indicates that the test statistic would have to occur
less than 1% of the time, by chance, before the hypothesised value
could be rejected as unlikely.
The most common level of significance used is = 0.05 (but it is
your choice).
24
2. Assume that H0 is true and calculate the observed test

statistic value, using 0 = the hypothesised (numeric) value for .
(all tests of significance follow these steps)

1. Set up a null hypothesis H0
- in this case, we have the z-test with test statistic:
and an alternative hypothesis H1

(some texts use Ha - very American)
Z=
and a level of significance (alpha)

The H0 is always for no effect or no change or no difference.
H0: = 0
H1: 0
0 is the generic notation for the numeric

value of the population mean it is a number!
H1: 175
= 0.05
- with observed value:
zobs =
if we reject H0 we will choose to believe H1

In this case: H0: = 175
X
n
Here 0 = 175, but we still dont

know what actually is, and
most likely never will know.
x 0
n
The zobs is a number.
25
26
4. Compare the p-value with , the level of significance and hence

either reject or dont reject H0.
3. Calculate the p-value, i.e. the probability of obtaining a test

statistic at least as extreme as the observed value, just due
to chance (if the null hypothesis is true).
if p > Retain H0
evidence to be able to conclude
there is insufficient
in
that the true mean is not 0.
In this case:
i.e. the sample mean is not significantly

different from the hypothesised mean
i.e. p-value = P(| Z | | zobserved |)
if p Reject H0
there is sufficient evidence (at that level of statistical
significance) to be able to conclude that the true
mean is not 0.
27
i.e. the sample mean is significantly

different from the hypothesised mean
28
(a significance level of 5%)

For the general case, testing at the % level of significance:
We reject H0 if p 0.05:
This is equivalent to obtaining an observed z greater than 1.96 or
less than -1.96 since P(-1.96 Z 1.96) = 0.95
We retain H0 if p > 0.05:
A zobs between -1.96 and 1.96 will result in a p-value > 0.05 and
will mean H0 is retained.
-z/2
1.96 is called the 5% critical value
z/2
Values of z in this range are deemed

reasonably likely to occur by chance
Retain H0 with Prob = 1
So for the general case, it doesnt matter if:

we compare p with and reject if p
or
Values of z outside this range are deemed reasonably unlikely

to
un
occur by chance Reject H0 with prob =
we compare zobs with zcritical and reject if | zobs | z/2

29
Both of these probabilities are only these values in the case of H0 being true
30
Again, in carrying out the z-test we are assuming that:

X is Normally distributed. That is:
X is Normal or
n is large enough for the CLT;
and
NEVER say that is 175 or is not 175.

Whether we reject or retain H0, we could be making a mistake.
and that we have a

representative sample
For the above example, if we tested that = 176, = 175.5,

= 180 etc, we would retain H0 in every case.
is known.
This does not mean we believe each of these values to be correct

it means that we do not have sufficient evidence to disbelieve
that value as the true population mean height for this year.
Basically the z test is in the form:

zobs
= x - 0
s.e.(x )
= statistic - E[statistic | H0 true]

s.e.(statistic)
This is comparing the distance the observed

sample mean is away from 0 relative to the
inherent variability of the X random variable.
31
32
If we retain H0:
The possibilities are we could be doing so because:
For the decision Retain

Retain H0
1. we used the correct value for ;

OR
2. we used an incorrect value for , and because of an unusual
sample, we obtained an unlikely z value.
The sample mean is not significantly different from the

hypothesised population mean value;
If referring to the sample mean ...
talk about statistical significance
If we reject H0:
OR
The possibilities are we could be doing so because:

1. the value for is in fact different from that hypothesised;
OR
2. we used the right value for , and because of an unusual
sample, we obtained an unlikely z value.
There is insufficient evidence to conclude the population

mean is different from
etc.
If referring to the population mean

... talk about statistical evidence
34
33
For the decision Reject

Reject H0
Always state clearly the level of significance.
The sample mean is significantly different from the hypothesised

population mean value; If referring to the sample mean ...
talk about statistical significance
OR
For example, if we obtained a p-value of 0.03 - we would:
There is sufficient evidence to conclude the population mean is

different from
If referring to the population mean
... talk about statistical evidence
If H0 is rejected, we can see the
direction of the difference by
looking back at the data:
*X can be sig. less than 0
*X can be sig. greater than 0
Remember that all

decisions are made at a
pre-specified
significance level.
reject H0 if we are testing at the 5% level of significance.

not reject H0 if we are testing at the 1% level of significance.
Not a problem if we are aware of our level of significance.
In effect here, the sample mean was far enough away from the
hypothesised mean to:
confidently reject it at the 5% level,
but not far enough away to reject it at the 1% level of significance.
35
36
The choice of the level of significance is very important.

= 5% is usually the best level - by far the most commonly used.
There are two types of errors possible when making a decision in

a hypothesis test.
= 1% is also often used.

Whatever choice is made, the level of significance should be
determined in advance of the test.
i.e. determine in advance (of seeing the data) the level of evidence
required.
General terminology:
This occurs if H0 is true and we incorrectly reject it.

Type II error
This occurs if H0 is false and we incorrectly retain it.
If you reject H0 at the:

5% level: result is said to be significant
1% level: result is said to be highly significant
37
This occurs if H0 is true and we incorrectly reject it.
These are both the result of obtaining an

unusual sample.
The problem is, we NEVER know if we have
an unusual sample or not.
38
This occurs if H0 is false and we incorrectly retain it.
because we have observed an unusual sample, but we dont know that
because we have observed an unusual sample, but we dont know this
The probability of this error type IS the significance level .
The probability of this error type is given the notation

::
= P(Type II Error)
= P(Type I Error)
= P(Retain H0 | H0 false)
= P(Reject H0 | H0 true)
because the test is designed so that if H0 is true, we will reject % of z-values
Evaluating Type II error rates is messy they depend on and the

true value of . The further the true is from the hypothesised
value 0 , the easier it will be to reject H0
e.g. P(Type II error | = 180)
Some professions (medical, psychology) call the probability of the

complement of this error type the SPECIFICITY of the test.
Specificity = 1 - = P(Retain H0 | H0 true)
The probability of CORRECTLY retaining H0
Type I error
P(Type II error | = 185)

39
40
There is always the possibility

of making a wrong decision !
Legal Analogy:
A Type II error occurs when H0 is false, and we incorrectly retain
it .
Significance Test
The complement of a Type II error occurs when H0 is false, and we

correctly reject it .
H0 not rejected
Decision
Statisticians call the probability of making this correct decision the

POWER of the test
H0 rejected
Power = P(Reject H0 | H0 false)

=1-
Jury trial
Decision
Situation in reality
H0 true
H0 false
Type II error
Type I error
Situation in reality
Person innocent
Person guilty
Error
Error
The probability of CORRECTLY rejecting H0
Declared not guilty
Some professions call this the SENSITIVITY

SENSITIVITY of the test
(we will call it power)
Declared guilty
41
Minimising error probabilities
42
Relationship between and (one-tailed test)
Unfortunately, we cant make both error probabilities (

and )
low at the same time.
As
decreases
de
increases
in
As
increases
in
decreases
de
The compromise between the two error rates depends on the

cost of each of the error types (cost in various terms
money, human lives, environmental pollution etc.)
The choice of determines the relative importance of the two
types of error.
For most tests, if in doubt, the best general compromise is an of 5%
will usually give a reasonably low (chance of missing the difference if one exists).
However in some circumstances, there will be reasons to have a

very low or a very low .
43
Picture 1: standard
= 0.05 = 0.7405
We can see:
Picture 2: lower
= 0.01 = 0.9074
and
Picture 3: higher
= 0.20 = 0.4371
44
So, at the preliminary screening stage, we want to continue testing

all drugs which may end up being useful
In trialing drugs to lower blood pressure, the

null hypothesis H0 would be that the drug has
no effect.
a Type I error would not be costly

(doing a bit more testing on a useless drug)
If doing a preliminary screening (lab testing) of a number of drugs:
a Type II error would be costly

(throwing out a useful drug)
a Type I error
would occur if there is evidence the drug is useful (when it isnt)
would lead to more intense testing only a little bit costly (time and money)
So, set a very high P(type I error),

say = 0.10
a Type II error
would occur if there is insufficient evidence the drug is useful (when it is)
gives a low P(Type II error)
would stop further testing very costly (in terms of lost potential benefits)
45
In the final stages,

stages if trialing a new drug to lower blood pressure (for
example):
a Type I error would be saying the drug is useful (when it isnt)
a LOT costly!
Spending masses of money on manufacturing and advertising etc.
a Type II error would occur if we missed a possible useful drug
and dont invest in it
somewhat costly
(in terms of lost potential benefits to humanity)
46
So, in the final stages the error types have a very different cost than in
the initial stages.
For the blood pressure drug example:
a Type I error would be costly
(manufacturing and marketing a useless drug)
a Type II error would be not be so costly
(not developing a useful drug)
We want P(Type I error) to be very low at the expense of a high P(Type II error).
We want the probability of marketing an ineffective drug to be close to zero.
but more drugs are always being tested.

47
So, set a very low P(Type I error),

say = 0.005
48
Decision based on Confidence Interval
Decision based on zz-test

+1.96 and -1.96 are the cut-off (critical) values for the z-score in a
5% test of significance.
retain H0
zobs =
x 0
reject H0
zobs =
< 1.96
x 0
The 95% confidence interval is the range of values of 0 that

would be retained if we carried out a 5% test of significance.
1.96
CI contains 0 retain H0
CI does not contain 0 reject H0
The significance level % is the maximum chance of false

rejection of H0 we are prepared to put up with.
In general terms: The 100(1 - )% C.I. is the range of values of 0

that would be retained if we carried out an % test of significance.
For a 95% confidence interval for , we used
P
< 1.96 = 0.95
n
In both cases (the z-test and the confidence interval) we used

exactly the same rejection regions.
confidence level = 1 - significance level
= 1
49
50
The 95% C.I. for (the true mean height this year) was (170.0 , 188.6)
Example:
Example:
Testing H0: = 175
The drying time of paint that is being marketed

varies depending on the conditions
(temperature, humidity, air movement, etc.).
versus
H1: 175
175 lies in the 95% confidence interval for .

Retain H0 at the 5% level of significance.
There is insufficient evidence to be able to conclude that the true
mean height is different from 175 cm.
Testing H0: = 165 versus
H1: 165
165 lies outside the 95% confidence interval.

Reject H0 at the 5% level of significance.
A new additive is being assessed as to whether

it decreases drying time (a good thing!).
Based on this sample,

there is evidence the true
mean height for this years
male STAT171 students is
greater than 165 cm.
There is sufficient evidence to conclude that the true mean

height is different from 165 cm.
51
Without the additive, the drying

time of the paint follows a Normal
distribution with a mean of 75
minutes, and a standard deviation
of 9 minutes.
52
In this case we are only interested in the new paint (with the
additive) if it DECREASES (typical) drying time.
The new drying agent is added to see if it improves

(lowers) the drying time.
If is still 75 (no change in mean) or is greater than 75, (increase

in average drying time) we are NOT interested.
If there is a sufficient decrease in (average) drying

time, the new paint will be marketed.
Hence our test becomes:
An experiment is carried out to test the new paint.
H0: = 75
For this sample of drying times:

n = 25
x = 71.5
H1: < 75
= 9 (assumed)
We carry out a one tailed test instead of a two tailed test, since we
are only interested in more deviant outcomes that support H1.
Q: Has the additive improved the drying time?

i.e. Do we market the new paint?
53
Our test statistic is still
Z=
To evaluate the p-value, we must assume the requirements for the

z-test are satisfied:
X 0
X is normally distributed
BUT:
BUT What is the distn of X?
no data here, so cant assess
told in question X is normal
but now, we have to take into account the sign.

[In two tailed tests we could ignore the sign.]
sample size = 25, so even without knowing the X distribution

is normal, the CLT should apply ... so it should be is safe to
assume the distribution of X is approx normal
The observed value of the test statistic here is
zobs =
54
x 0
71.5 75
=
1.94
9 25
n
= 9 variability of drying time is the same with and

without the additive
55
56
Comparison of rejection regions
p-value P(Z -1.94) 0.0262
Two tailed test H1: 0
The RR is the left hand tail of the z-distn,

since the alternative is < (less than)
Reject H0 at the 5% level of significance.

We can conclude that there is evidence at the 5% level of
significance that the additive does decrease the average drying
time of the paint.
H1: < 0
One tailed tests
H1: > 0
We could also have compared zobs with the

5% one-tailed critical value ( -1.645)
Reject H0 since zobs < z0.05
(i.e. zobs lies in the rejection region)
Same conclusion.
57
58
For a one tailed test:

H0: = 0
new drug to decrease blood pressure
H1: < 0
new fertilizer to increase yield
If x > 0 then there is absolutely no evidence that < 0.

Obviously we will retain H0 with a very strong belief (no matter how
large x).
x).
There is no use in formally doing the test.
new packaging to improve sales

new diet to reduce weight
new technique to increase reading speed
For the paint example:
etc, etc
If x = 76.2 Retain H0 at any level of significance.
In each case, we are only interested in whether the new situation is

better.
Similarly for H1: > 0

We will retain H0 automatically if x < 0
59
We dont care to distinguish between the new situation having no

effect and being worse.
60
BUT: There is often no right or wrong answer to the one versus

two tailed test.
By choosing between a one tailed or a two tailed test

you are changing the rejection and non-rejection
regions (and hence the p-value).
It depends on what your research question is.
So you require a reason before you can do a one tailed test the
reason would be based on the question of interest, or the research
hypothesis, as this dictates the H1.
Dont let the data suggest a one tailed test.

Warning (2):
(2): So far, we have only seen confidence intervals which
are two tailed: (x something )
The only real difference in carrying out a one or two tailed test is
the calculation of the p-value.
Everything else is the same.
61
OneOne-sided confidence intervals
62
P < X + 1.645
= 0.95
n
From standard normal tables, we know that

P(Z > 1.645) = 0.95
But if X is normally (or approx) distributed: Z =
The medico would almost certainly do a one tailed

test is the drug worth using to lower blood pressure?
A physiologist or a chemist might want to do a two
tailed test - is the drug affecting blood pressure in any way?
If you dont have a reason do a two tailed test.
!! You cant do a one tailed test using a twotwo-tailed

confidence interval.
e.g. In testing the effect of a drug on blood pressure:
This gives us an upper limit on the believable value of .

X
n
We get an unbounded 95% confidence interval for ,

being , X + 1.645
So, we can make one directional statements:

X
> 1.645 = 0.95

P
n
This interval can be used to test the

alternative hypothesis H1: < 0
If 0 is in the interval, retain H0
not enough evidence to conclude < 0
P > X 1.645
= 0.95
n
If 0 is not in the interval, reject H0

enough evidence to conclude < 0
P < X + 1.645
= 0.95
n
63
If the value of 0 is
outside the 95% onesided CI for , we
would have evidence
that the population
from which the sample
was taken has a mean
less than the value 0 .
64
A greater than alternative hypothesis can also be tested:
This gives us a lower limit on the believable value of .
For H1: > 0
We get an unbounded 100(1-)% confidence interval for ,

being
, +,
X + 1.645
n
and a general significance level ,

P ( Z < z ) = 1
This interval can be used to test the alternative

hypothesis H1: > 0:
P
< z = 1
n
If 0 is in the interval, H0 is retained
z is positive it is
the value which
cuts off an area of
in the upper tail
P < X + z
= 1
n
P > X z
= 1
n
If 0 is not in the interval, H0 is rejected
65
66
Summary of oneone-tailed C.I.s

Testing
versus
H0: = 0
H1: <
< e.g. H0: = 100
The experiment on the new paint gave the following summary

data for drying time:
H1: < 100
P-value = P(Z < zobs ) reject if p-val 0.05

100(1-)% CI for
= ( , X + z*/n )
Testing
versus
n = 25 , x = 71.5 , = 9 (assumed)
retain H0 if 0 is in CI eg ( , 102)
reject H0 if 0 not in CI eg ( , 96.7 )
Has the additive improved the drying time?

H0: = 75
H1: < 75
H0: = 0
e.g. H0: = 100
H1: > 0 >
H1: > 100
We need a CI with an upper limit, so

95% CI for = ( , x + z0.05 * /n )
= ( - , 71.5 + 1.645 * 9/25 ) ( - , 74.461 )
P-value = P(Z > zobs ) reject if p-val 0.05

100(1-)% CI for
= (X z*/n , + )
retain H0 if 0 is in CI eg ( 98.3 , + )
reject H0 if 0 not in CI eg ( 102 , + )
All the theory of two-tailed tests and confidence intervals holds.

The only difference is all the Type I error probability is being put
into one tail, not two.
We can reject H0 .
there is evidence that the true mean is less than 75 minutes
market the new paint
67
68
Using Minitab for Height example:

example
Using Minitab for Paint example:

example
> Stat
> Basic Stats
> 1-Sample Z
Minitab can also be used when you

only have the summary statistics.
> Stat
> Basic Stats
> 1-Sample Z
MTB > OneZ Height;

SUBC>
Sigma 15;
SUBC>
Test 175.
Test of mu = 175 vs mu not = 175
The assumed standard deviation = 15
Variable N
Mean
Height
10 179.30
StDev
9.84
SE Mean
4.74
95.0% CI
Z
P
( 170.00, 188.60) 0.91 0.365
69
Paint drying time example
How to choose the direction of the alternative hypothesis
Correct One tailed z test:

One-Sample Z
Test of mu = 75 vs < 75
95% Upper
N
Mean SE Mean
Bound
Z
25 71.50
1.80
74.46
-1.94
(From a discussion forum in 2015)
This is not always a simple choice.

What the null hypothesis direction is and what the alternative
hypothesis direction is DEPENDS on what the purpose of the study
is ... and sometimes it is not purely determined by that!
But ... Think of the idea behind inference using hypothesis tests:
P
0.026
For a two tailed z test:

One-Sample Z
Test of mu = 75 vs not = 75
N Mean SE Mean
95% CI
Z
25 71.50
1.80 (67.97,75.03) -1.94
P
0.052
For the other one tailed z test:

One-Sample Z
Test of mu = 75 vs > 75
95% Lower
N
Mean SE Mean
Bound
Z
25 71.50
1.80
68.54
-1.94
70
P
0.974
Using the wrong

alternative can lead
to a different
conclusion
H0 is the "status quo" ... nothing is different from before, or from

what is being claimed. You need to get evidence against this.
H1 is the "action hypothesis ... That something has changed or is
not as claimed. You need to get evidence for this.
Examples will be done in tutorials.
71
72
Summary:
All confidence intervals require the following:
All hypothesis tests require the following:
statement of confidence level and parameter involved:

eg a 99% c.i. for D is ...
the null hypothesis: H0: parameter = value

alternative hypothesis H1: parameter or < or > value
significance level = value
a brief statement of any necessary distributional assumptions and justification
(if required)
correct application of the CI calculations

a meaningful conclusion (only if asked for one)
reference to a CI if already evaluated; or

application of the correct test (many more to come)
For hypothesis tests, use a significance level of 5% (

( = 0.05)
unless otherwise specified.
[one or two sample z, t (pooled or unpooled), paired t, etc.]
evaluation of the test statistic value (and df if appropriate)

evaluation of the p-value (or an interval in which the p-value lies)
then
statement of decision (Reject or Not reject)
a meaningful conclusion
(in terms of the question of interest, including direction of change)
For confidence intervals, use a confidence level of 95% (1-

)
unless otherwise specified.
73
74
A full hypothesis testing example

A sceptic (Eleanor) does not believe that the die is
loaded in this particular way.
Consider the case of the loaded die as dealt with in

Tutorial Exercises.
Eleanor tosses the die 240 times and records

her results.
P (Y = y ) =
y
; y = 1, 2,3, 4,5, 6
21
The total of the uppermost face values was equal

to 993.
It was shown that the expected value and variance of the value (Y)
on the uppermost face of the die were:
13
1
E (Y ) = = 4
3
3
Var(Y ) =
20
2
=2
9
9
; and
Y =
20
1.490712
9
This information is to be used to test the null hypothesis

that the die is loaded in the specified way.
That is, we will test the hypothesis that the average result on the
uppermost face for the die is 4.33.
Assumptions are to be stated (with a brief justification, if required).
75
76
Example
Hypothesis test: H0: = 4.333
versus H1: 4.333
Dealing with the sample mean, the test statistic is Z =

at
= 0.05
X
n
993
= 4.1375
240
1
20
Given: Distribution as stated with = 4 , =
3
9
The sample mean is x =
Assumption: X
X is approximately normally distributed.
We need to use the normal approximation to the exact sampling

distribution of X, which has
Justification:
Here, X is discrete and skewed, so X
X will also be discrete and skewed.
skewed
Mean = E ( X ) = 4
se ( X ) =
However, the approximate normality for the sampling distribution

of X is a reasonable assumption:
the sample size n = 240 should be
sufficiently large for the Central Limit Theorem to work here.
X
n
1
3
20 9
240
1
0.096225
108
=
77
The sample mean X is discrete, so we can obtain a more accurate

approximation from the normal if we use a continuity correction.
zobs =
For the total, the cc would be ,

but we are dealing with the mean = total/240
p-value 2*P(Z 2.0135) 2 * 0.0222 = 0.0444
78
1
1
4.1375 +
4.33
0.19375
2n
480
=
2.0135

0.096225
n
2.22 240
x+
here, the cc is ()/240 = 1/480 .
Reject the null hypothesis (at = 5%).
To obtain the p-value, we need the area in both tails ...
The mean from the sample is significantly different from the

hypothesized mean for the biased die.
but the sample mean (4.1375) is less than

the hypothesised mean (4.3333) ...
Hence, there is sufficient evidence to refute the claim that this die
is biased in the way claimed.
and we want to include our observed mean in the tail area ...
so the cc here is to add 1/480
Evidence suggests the die has a smaller mean than stated.

the observed value of the test statistic is
79
80
Some points to note:
(ii) Working with the total, we get the same result as with the
mean:
(i) Without cc, we would get:
zobs =
x
4.1375 4.33
0.195833
=
2.035
0.096225
n
2.22 240
x + 2 n
i
zobs =
p-value 2*P(Z 2.035)

2 * *(0.0212+0.0207)
0.0419
i =1
1
993 + 1040
46.5
2
=
2.0135

23.094011
240 2.22
(iii) We can avoid the need to take account of whether X

X
is larger or smaller than by using absolute values:
The p-value evaluated using the continuity correction is higher than

the pvalue evaluated without it. In this case the decision is exactly
the same THANKFULLY!
1
1
4.1375 4.33
2n =
480 0.19375 2.0135

0.096225
n
2.22 240
x
zobs =
81
From a 2015 discussion forum
Example: confidence interval:
Q1: For the continuity correction (dealing with the mean X and
not the sum of the observations), ... how do you know when to
add or subtract 1/(2n)?
The relevant confidence interval for the true mean of this die is:
2.22
95% c.i. for = 4.1375 1.96
240
A1:This requires a bit more thought ... but it is fairly simple.

For 'a' integer, to find the area:
- strictly greater than 'a' (>a), the cutoff is 'a+0.5'
- greater than or equal to 'a' (a), the cutoff is 'a-0.5'
- strictly less than 'a' (<), the cutoff is 'a-0.5'
- less than or equal to 'a (), the cutoff is 'a+0.5
( 4.1375 + 0.18860 )
( 3.949, 4.326 )
From Minitab, we get:
For a mean from a discrete distribution (which will be an integer

divided by n), simply divide the value above by n.
One-Sample Z
Test of mu = 4.33333 vs not = 4.33333
The assumed standard deviation = 1.49071
N
Mean
SE Mean
95% CI
240
4.1375
0.0962
(3.9489, 4.3261)
82
-2.04
0.042
83
For hypothesis testing, with a alternative we want the tail area

(including our observed mean) DOUBLED. Using the absolute
value saves having to look beforehand to see if x is < 0 or > 0 . 84
From a 2015 discussion forum

Q2: Is the continuity correction for t-values the same as it is
for z-values?
The accuracy of the point estimate for a population mean is

measured by the margin of error (which is half the length of the
two-sided confidence interval). The 100(1-)% two-tailed

confidence interval for a population mean is
X z 2
A2: It is the same in application, but is rarely done, as we usually

look at t-tables but this is a good point!
This requires knowing , the confidence level, and that X is

normally distributed (or n is large enough to apply the CLT and
assume an approximate normal distribution for X).
If we wish to set the maximum distance of the estimate X from
the target to be B (boundary), we get the inequality
z 2
z 2
n
85
The 100(1-)% one-tailed confidence interval for a population

mean is either , X + z or X z , +
Recall that in past years, the height of male STAT171 students has
followed a Normal distribution with = 175 cm & = 15cm.
15cm
If we wish to have a margin of error of no more than 5 cm in a 95%

confidence interval for (this years population average height),
we must take a sample of at least size n, where
2
Here, setting a limit on the one-sided bound results in the

2
inequality
z
n
For the paint drying example, if we want the limit of the CI to be

within 4 minutes of the true average drying time, we would need
z 2 1.96 15
2
n
=
= 5.88 = 34.5744
5
B
2
z 2.33 9
2
n =
= 5.2425 27.48
B 4
Hence, n 35.
A sample size less of 34 or less will cause the margin of error in
estimating to be greater than 5 cm.
86
that is, a sample of at least 28 paint trials.

87
88
The Orion Nebula

When designing an experiment or survey, the most common
question asked is ...
1300 light years away, 24 light years across.
with the naked eye
To naked eye (not a

powerful tool) a dot,
What sample size should be used?

used?
Two issues must be considered when determining a required
sample size:
but with a good (powerful)

telescope much can be
distinguished.
1) What is the minimum difference youd be interested in finding?

For example, the population average used to be 0.75 and wed be interested if
we knew it was 0.73 or smaller, but not anything more (anything above 0.73
will not be actioned such as manufacturing a new drug).
You are using a statistical

telescope when running a
hypothesis test need to
know how strong it is.
2) How sure do you need to be that youve found such a difference?

We usually want this to be fairly high, such as 95% or 99% etc.
The tool we need is power and this should be borne in

mind whenever doing hypothesis testing!
with a fantastic telescope
89
90
That is Power is the probability we reject H0 when H0 is false.
Recall from Topic 6 there are two types of error which can occur
when testing a hypothesis:
Power is the conditional probability we make the right decision when

there truly is something different from the null happening.
Type I error:
error this occurs when H0 is true, and we wrongly declare
it to be false:
= P(Type I error)
= P(Reject H0 | H0 true)
Power = P(Reject H0 | H0 false)

=1-
Type II error:
error this occurs when H0 is false, and we wrongly declare
it to be true:
= P(Type II error)
= P(Retain H0 | H0 false)
We would prefer power to be 100%, but due to inherent variability,

that can only happen if we always reject, no matter what (even when
H0 is true).
Power is the probability of NOT making a Type II error.
This would mean our Type I error rate () was also 100% .
91
92
We need to be able to evaluate power in a given situation to see

how we can increase the probability of finding a true difference
without also increasing the probability of finding a false
difference.
Returning to the paint example:

H0: = 75
H1: < 75
x = 71.5 minutes
That is, for a fixed Type I error rate, how can we improve the
power?
zobs =
We concluded that there was evidence that the new additive reduced
the average drying time of the paint.
93
e.g.
71.5 75
0.870
9 5
Dont reject H0
Now (with n=5 rather than n=25) we cannot conclude that the
additive has a significant effect on the average drying time.
For n = 10,000: zobs

p-value = P(Z -38.89)
94
For very small samples it is often difficult to obtain a

significant result even for large observed differences.
For very large samples - you can quite often obtain a statistically
significant result even for very small, even trivial, observed
differences.
71.5 75
=
38.89
9 10000
0.00
For n = 10,000 what is the minimum difference that we can

declare significant at the 5% level of significance (for the paint
drying example)?
Reject H0 at any significance level.

Now we are very confident that the additive decreases the
average drying time of the paint.
What would have happened if we had obtained the same

x of 71.5, but from a different sample size?
Would it affect the conclusions?
The sample size can affect your conclusions because the larger the
sample size, the more confident we can be about the sample mean
as an estimate of .
p-value = P(Z - 0.870) 0.1922
e.g.
71.5 75
1.94
9 25
p-value = P( Z - 1.94) = 0.0262 Reject H0 at the 5% level.
We will start by investigating how the hypothesis test itself is

affected by things we can control such as sample size and
significance level.
For n = 5
5:: zobs =
= 9 n = 25
95
96
zobs =
Practical significance vs Statistical significance
x 75
9 10000
If a result is statistically significant, we are saying that we are

reasonably confident that the actual population mean is different
from the hypothesized value.
will be declared significant if zobs is less than (or equal to) -1.645
(recall, H1 is < 75).
That difference may or may not be of interest

or importance to us.
So the cut-off value of xx is given by:

xx - 75 - 1.645 * 9/
9/10,000
10,000
x 75 - 1.645 * 0.09 = 75 - 0.148 = 74.85
For n = 10,000, any sample mean of 74.85 minutes or less will be
declared (statistically) significantly less than 75 at the 5% level of
significance.
i.e. a 9 second (or more) difference will be declared significant at
the 5% level.
We need to decide what difference is meaningful in terms of our

experiment.
For the example:
example
We need to think in terms of what difference in
average drying time of the paint is marketable.
For example, we might decide to market the paint only if the
decrease in average drying time is more than 2 minutes.
minutes
97
98
If we observe a difference of two minutes (or more), and it is

significant, we will market the paint.
If we observe a difference of less than two minutes, we wont
care if it is significant or not because we wont market the
paint anyway.
So, if we observe a difference of two minutes or more, we want
the test to be significant at the 5% level.
How large does n have to be to be able to declare such a

Practical sig
(diff =2)
(later we may want to increase the sample size

further, for other reasons)
We want: zobs
99
result significant?
73 75
9 n
Statistical sig
= p-val 5%
1.645
100
Just a matter of solving for n 73 75 1.645

9
If we are able to apply a z-test ( is known and X is normal or

approx normal) for testing
2 n
1.645
9
n 1.645
H0: = 0
9
= 7.4025
2
versus H1: < 0
n ( 7.4025 ) 54.80
2
at the 5% level, we will reject H0 if:
Hence n 55 (as n is integer)

where:
If we have a sample size of 55 (or more) and we observe a decrease in

average drying time of at least 2 minutes, we know in advance that the
result will be significant at the 5% significance level.
If we have a sample size of less than 55 and we observe a decrease of
exactly 2 minutes (or anything less) the result will not be significant
at the 5% level.
x 0 is the minimum difference required for action;

zcrit comes from the tolerance (how often we are prepared
to be wrong when there is no difference);
needs to be known (or estimated from past work).
101
So, once we have decided on what values for these are to be used,
we solve the equation to obtain the minimum sample size that will
achieve this.
Solve for n:
x 0
1.645
n
x 0
1.645
n
102
If we do not know that X is normal, we have to assume (hope) that

n will be large enough for the CLT (Central Limit Theorem) to
work, so that the distribution of the sample mean will be
approximately normal. Check this after the calculations are done.
Note:
Note will probably be unknown and an estimate will have
to be used instead.
We can use the t-distribution to get the critical value, but this
depends on the degrees of freedom, which depends on the sample
size n, which is what we are trying to determine.
In general, for a one sided z-test, the minimum sample size

needed is:
2

n z
To solve the dilemma, use the zcrit as this will give a rough idea in
helping to plan the experiment.
103
104
For any test of significance, there are two possible correct outcomes,
and two incorrect outcomes. Recall:
We should plan what sample size would be required to achieve

significance, before setting up the experiment.
Requirements:
Requirements
H0 true
1. An idea of what practical significance is desired - i.e. the

minimum difference x 0 that would be meaningful.
2. An idea of the likely value of chance variation (from previous
experiments, the literature, etc).
3. Use of the appropriate formula to determine the minimum
sample size required to achieve significance.
Type II ERROR
H0 rejected
Type I ERROR
The errors are the outcomes of making a wrong decision.

= P(Type I error) = Prob(Reject H0 | H0 true)
= P(Type II error) = Prob(Retain H0 | H0 false)
105
The probability of making the correct decision when H0 is false is

called the POWER of the test.
The aim is to minimise for a fixed (usually 5%).

106
Before carrying out the experiment, we decide we will be doing a

one-sided test at the 5% significance level for testing:
H0: = 75
versus H1: < 75
we know:
= 9 and n = 25
Power = Prob(reject H0 | H0 false)

= 1 - Prob(retain H0 | H0 false)
=1-
So, minimising and maximising power are the same thing.
(and hence Power) depends on:
sample size (larger n higher power )
BUT this makes the
(higher higher power )
Type I error rate higher
the true value of the parameters.

We cannot make and smaller together, so we choose as the
maximum Type I error rate we are prepared to put up with.
H0 retained
They have CONDITIONAL probabilities:
Warning:
Warning We may observe sample statistics that are entirely
different from what we are hoping/expecting to observe.
whether the test is one or two tailed
H0 false
We can determine the rejection region in

terms of the value of the sample mean.
mean
When we carry out the test we know
we will reject H0 if zobs -1.645
That is reject if: x 75 1.645
9 25
9
25
x 75 2.961 = 72.039
x 75 1.645
107
108
We know that if H0: = 75 is true, the sampling distribution of

the sample mean is:
2
9
X ~ N 75,
~ N 75,1.82
25
i.e. If we carry out a one-sided z-test at the 5% level of significance

to test
H0: = 75 (vs <) with a sample size of 25,
we will:
reject H0 if X 72.039
not reject H0 if X
X > 72.039
So we can calculate the power of the test in advance by calculating

the probability that X
X 72.039 for different values that (the
true average drying time) might take.
The red area is 0.05,

giving the rejection
region in terms of
values of the sample
mean.
Note that we have no idea:

what the true value of is, or
what value of X will be observed.
109
110
Power = P ( Reject H 0 H 0 false )
= P X 72.039 = 70
That is, if the true mean is 70 and we carry out a one-tailed test
for = 75 at the 5% level of significance testing :
there is a 5% chance that we would reject H0 when we

shouldnt (we have set this rate).
there is an 87% chance that we would reject H0 when we
should - i.e. make the right decision.
X 72.039 70
= 70
= P
9 25
2.039
= PZ
1.8
P ( Z 1.133)
0.8708
We can drop the conditioning here,

as the Z has a standard normal
distribution (if we have conditioned
on the correct value for ).
111
BUT there is a 13% chance that we would retain H0

- i.e. make a Type II error.
112
We can plot the distribution of the sample mean:

under H0: = 75,
or when true = 70 (or any other specific value).
The test statistic will be significant if X is less than (or equal to) 72.039.
Reject H0
87%
5%
What would happen to the power (area under the black density curve to the left
of 72.04) if:
Reject H0
the true mean is less than 70?
the true mean is greater than 70 (but still less than 75)?
87%
5%
113
When = 65:
65
When = 71:
71 Power = P ( X < 72.039 | = 71)
X 72.039 71
= 71
= P
<
n
9 25
P ( Z < 0.577 ) If is 71, there is a 72% chance of

(correctly) concluding it is less than 75.
0.7190
When = 73:
73 Power = P ( X < 72.039 | = 73)
X 72.039 73
= 73
= P
<
n
9 25
P ( Z < 0.534 )
0.2946
114
If is 73, there is
only a 30% chance
of (correctly)
concluding it is
less than 75.
When = 74.9:
74.9
Power = P ( X < 72.039 | = 65 )
Power = P ( X < 72.039 | = 74.9 )
X 72.039 65
= P
<
= 65
n
9 25
P ( Z < 3.911)
1.0000 (close enough)
X 72.039 74.9
= 74.9
= P
<
n
9 25
P ( Z < 1.589 )
0.0559
Power does not exist when H0 is true, that is when =75 here,
but the power approaches 0.05 (the significance level).
Recall, P ( Type I error ) =P ( Reject H 0 H 0 true )

P ( X < 72.039 | = 75 )
P ( Z < 1.645 ) 0.05
115
116
Plotting Power versus possible values of

Table of calculated powers
- 75
Power
65
-10
1.000
70
-5
0.871
71
-4
0.719
73
-2
0.295
-0.1
0.056
74.9
> MTB >

> Stat
> Power and Sample Size
> 1-Sample Z..
and the plot of the points

(the power curve would join them)
117
Plot of Power versus difference =
118
x 0
Power increases as the sample size gets larger

we are more likely to pick up a difference with a
larger sample size than a smaller sample size.
Power increases as the true gets further away from the

hypothesised mean 0 we are more likely to pick up a bigger
true difference in population means than a smaller difference.
119
120
From the Session window

We can calculate the Power for various samples sizes for a
particular .
Power and Sample Size

1-Sample Z Test
Testing mean = null (versus < null)
Calculating power for mean = null + difference
Alpha = 0.05 Assumed standard deviation = 9
Sample Target
Diff
Size
Power Actual Power
With a sample size of
-2
3
0.10
0.103843
96, we get a power of
-2
14
0.20
0.208002
70.28%.
-2
26
0.30
0.304417
-2
40
0.40
0.405399
-2
55
0.50
0.501273
If n was 95 (or less),
-2
73
0.60
0.600180
the power would be
-2
96
0.70
0.702800
less than the pre-2
126
0.80
0.802222
specified value of 70%.
-2
174
0.90
0.900859
-2
220
0.95
0.950655
-2
320
0.99
0.990107
We decided to market the paint if there was a difference of 2

minutes or more.
So we can calculate the sample size required to achieve a power of
20%, 30% etc, assuming that the true difference is 2 minutes less
(i.e. true = 73).
121
122
From the Session window
Different Example:
Example: zz-test power curve for different
[Note: this chart was NOT created in Minitab]
Repeating the output from the previous slide:

Diff
Size
Power Actual Power
-2
3
0.10
0.103843
-2
14
0.20
0.208002
-2
26
0.30
0.304417
-2
40
0.40
0.405399
-2
55
0.50
0.501273
-2
73
0.60
0.600180
-2
96
0.70
0.702800
-2
126
0.80
0.802222
-2
174
0.90
0.900859
-2
220
0.95
0.950655
-2
320
0.99
0.990107
1.00
0.90
0.80
H0: = 120
Ha: > 120
Power (1 - )
0.70
0.60
0.50
0.40
0.30
= 0.10
= 0.05
= 0.01
0.20
0.10
0.00
118
119
120
121
122
123
124
125
126
127
128
True Value of
As the significance level is decreased, the Power decreases the problem

with making the Type I error rate low, is that we are also making the
power low there is LESS chance of picking up a true difference
To get the graph, cut and paste these values for Size and
Actual Power into columns in the worksheet, then do
Graph > Scatterplot > With Connect Line
123
124
z-test power curve for different n

Paracetemol tablets are marketed as having exactly 500 mg per
tablet. However, the actual amount per tablet varies.
1.00
0.90
0.80
H0: = 120
Ha: > 120
Power (1 - )
0.70
Suppose it in known that the amount of paracetamol in each tablet

follows a normal distribution with a population standard deviation
of = 4 mg.
0.60
0.50
0.40
0.30
n = 45
n = 90
n = 180
n = 360
0.20
0.10
0.00
118
119
120
121
122
123
124
125
126
127
128
True Value of
We would be interested in a two tailed test here, as a difference in

either direction would be a problem. That is,
H0: = 500
H1: 500
Power increases as n increases.

Here, the power does not exist for values of < 120 what is
actually plotted on the vertical axis is P(Reject), not power.
[Note: this chart was NOT created in Minitab]
At a 5% significance level, we
would reject H0 for zobs -1.96
or zobs +1.96
125
(cont ...)
We now need to convert this rejection region for zobs into one involving
the sampling distribution of the sample average we will first deal
with a sample size of 25.
When we carry out the test we will
reject H0 if zobs -1.96 or zobs 1.96
X
1.96
n
OR
X 500 1.96
X 498.432
4
25
Obtaining the power for = 499 (say)
Power = P X < 498.432 = 499
P X > 501.568 = 499
X 498.432 499
X 501.568 499
= 499 + P
= 499
= P
<
>
n
4 25
4 25
P ( Z < 0.71) + P ( Z > +3.21)

0.2389 + (~ 0)
0.2389
X
+1.96
n
X 500
+1.96
4 25
X 500
1.96
4 25
126
X 500 + 1.96
X 501.568
Power = area under

the black curve
outside the two
blue lines
4
25
127
Reject H0
Reject H0
128
Specify a two tailed

alternative under Options
The power curve for

n=50 is above that
for n=25.
The power curve is

symmetric around a
difference of zero, and
asymptotes to 1 the
further the distance
the true mean is from
0 = 500 here.
129
The power curve for

n=10 is below that
for n=25.
We can increase the power by increasing the

sample size n (keeping , , the same)
n is under our control
130
Power of a test: Conclusions

The power curve for
=0.10 is above that
for =0.05
The power curve for
=0.01 is below that
for =0.05
We have also seen that

power increases as the
difference between the
true mean and the
We can increase the power by increasing the hypothesised mean 0
significance level (keeping n, , the same) increases but the true
value of is NOT under
is under our control
our control!
131
1. The larger the significance level, , the higher the power of

the test (but the higher the Type I error rate).
This is our decision, but it does have a cost
2. The larger the sample size, the higher the power of the test.
This is our decision, but it does have a cost
3. The larger the difference between the hypothesized value and
the true value of the population parameter (mean or
proportion to be seen later), the higher the power
We have no control over this!!
Often the true value of the parameter is not known, and we
calculate the power of the test for a number of possible true
values of the parameter under study. From this we can sketch a
power curve.
132
The text book uses different terminology from the lecture notes.
For a z-test, it uses a large sample (n30) and estimates the
population standard deviation with the sample standard deviation s.
This is technically incorrect.

incorrect
In order to apply a -test,
test we must have:
1. independent observations
Z=
X
n
2. sigma known (population

population st.dev.)
3. either:
(a) X normally distributed; or

(b) n sufficiently large that the sample mean X is
approx normally distributed (application of CLT)
zobs =
x
n
If sigma is not known (the more realistic scenario),

we have a t-test which leads us on to Topic 7
133

Topic 6 Statistical Inference

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Topic 6 Statistical Inference

Uploaded by

Copyright:

Available Formats

1. Why do inference?

2. Confidence interval for 8.3 , 8.4 , 8.5

9.1 , 9.2 , 9.3

4. p value and level of significance 9.3

5. Type I and Type II errors 9.3

7. Statistical power 9.3

Estimates and their accuracy:

8. Power of a z-test (one & two sided)

Confidence Intervals & Hypothesis Testing

9. Practical versus statistical significance

Error Types, Power & Sample size

10. Obtaining the sample size

We take a random sample of 10 males from this year and obtain

Up until now we have assumed we know the population

not very realistic !

What conclusions can we make about the average height

But it is very unlikely that is 179.3 !

We want to investigate whether the average height of

If we take new sample we will almost certainly

Therefore, it makes sense to think in terms of a range of values that

We know from theory that:

If we can assume that X has a Normal

P ( 1.96 Z +1.96 ) = 0.95

mult by (-1): P + X + 1.96

write with smaller value at left hand end

This is called the 95% C.I. for All we need is X, n and .

By using a different z critical value, we can change the level of

(when X is normally distributed and is known)::

a 99% confidence interval (zcrit = 2.575),

a 90% confidence interval (zcrit = 1.645),

includes the true population mean .

But there is a 5% chance that it doesnt.

In general terms, a 100(1-)%

Note: x is always in the centre of the interval

- it is still our best single point estimate of .

General case of a C.I.:

99% confidence interval for :

= ( 179.3 2.575 * 15/10 )

We assume is unchanged from previous years but may be

Recall, is a constant (does

Note: the more confident that we want to be that the interval

There is a 1% chance this

The assumption is that:

Remember that is fixed for a given population.

There is a 10% chance this

It is the confidence interval that is the variable.

n to be large enough to assume that X

The statement should be made in terms of the

For the height example, s = 9.84, so it is reasonable, on this

We have observed: x = 179.3 n = 10 = 15.0 (assumed)

Whereas the approach that we have just used is:

is some other value (unknown, but not 175),

The population mean for previous years was 175 cm.

carry out a test of significance

If is not 175, then

because we have subtracted the wrong .

So we can test whether is 175 by testing whether

i.e. that the mean height of this years

If the test statistic is not a z value, then we have evidence

z-value believable (high prob of obtaining)

A z value can be any value in the range - to +.

z-value not believable (low prob of obtaining)

So, we can only determine if it is a reasonable

For our example the test statistic is: zobs =